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The realization that all theories of 
behavior are based, either explicitly 
or implicitly, upon some conception 
of nervous system function has made 
it increasingly apparent that a con- 
sideration of the neural mechanisms 
associated with behavior assumes 
the necessary correlation of two 
classes of interdependent variables. 
Recognition of the relatedness of 
neurophysiology and psychology has 
been facilitated by an extension of 
the interest of neurophysiologists 
from static, reflex-like mechanisms 
to central systems with a plasticity 
and time-course more appropriate to 
behavioral events. It has, in turn, 
contributed to an awareness on the 
part of psychologists that common 
neural processes underlie many be- 
havioral phenomena which have been 
operationally defined in terms of in- 
dependent, mutually exclusive cate- 
gories. 

Research on central nervous sys- 
tem structures has repeatedly indi- 
cated that the reticular formation is 
critically involved in many psycho- 
logical functions. The literature in 
this area has become so extensive 
that some criterion must be utilized 
in the selection of topics for cover- 
age. In this review, that criterion 


4 This paper incorporates ideas worked out 
| discussion with Ausma Rabe. The author 
Wishes to express her appreciation to E. L. 
Walker, C. J. Smith, and Ausma Rabe for 
their critical advice and assistance in the prep- 
aration of this manuscript. 
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has been “behavioral relevance’’— 
and the aspects of reticular function 
discussed are those considered par- 
ticularly germane to psychological 
phenomena. A brief review of the 
basic structural and functional char- 
acteristics of the reticular formation 
will therefore serve as introduction 
to the following topics: 


1. Interaction of specific and non- 
specific systems; 

2. Central control of afferent in- 
put; 

3. Cortical projections to the re- 
ticular system; 

4. The reticular system and the 
learning process. 


These areas are highly interrelated, 
and the decision to consider a par- 
ticular study in one category, rather 
than another, is, in many cases, quite 
arbitrary. 


ANATOMICAL AND PHYSIOLOGICAL 
PROPERTIES OF THE RETICULAR 
SYSTEM 


Extensive anatomical and physio- 
logical investigations confirm the 
highly differentiated organization of 
the reticular formation. Both struc- 
tural complexity and functional plas- 
ticity indicate its capacity to mediate 
a wide range of behavioral processes. 

The reticular formation may be di- 
vided into two functional systems— 
the brain stem reticular formation 
and the diffusely projecting thalamic 
nuclei. The brain stem reticular for- 
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mation includes structures at the 
level of the medulla, pons, midbrain, 
subthalamus and hypothalamus 
(Magoun: 1950, 1952a, 1952b, 1954). 
The midbrain reticular formation oc- 
cupies a position of prime impor- 
tance within this system. The dif- 
fusely projecting thalamic nuclei 
(also referred to as the thalamic re- 
ticular system) include ventralis an- 
terior, centre median, nucleus re- 
ticularis, and the intralaminar nu- 
clei (Jasper: 1949, 1954; Jasper & 
Ajmone-Marson, 1952; Starzl & Ma- 
goun, 1951; Starzl & Whitlock, 1952). 
Both the brain stem reticular and 
the thalamic reticular systems, when 
activated, induce a desynchronization 
of resting alpha rhythms through- 
out the cortex. This electrophysio- 
logical ‘‘arousal’’ response is, in 
general, correlated with an alert con- 
scious state of the organism (Jasper: 
1949, 1954; Jasper & Ajmone-Mar- 
son, 1952; Magoun: 1952a, 1952b, 
1954). 

The behavioral effects which ac- 
company either stimulation or abla- 
tion of the two systems are varied. 
Stimulation of the midbrain reticular 
formation and of the centre median 
nucleus in the thalamus has been 
shown to result in the following se- 
quence of events: at low voltages of 
stimulation, a sleeping animal opened 
his eyes and reacted to auditory and 
visual stimuli; at a slightly higher 
voltage, the animal awoke and looked 
around searchingly in a puzzled man- 
ner; with further increases in inten- 
sity, there was abrupt arousal, eae 
ing, flight, fear, agitation, and a y 
frantic efforts to escape. Stimu oo 
of the intralaminar nuclei in the 
awake animal produced an arrest re- 
action in which the eam ype 

oblivious to sensory ane is 
impairment of mee a a 
ment outlasted the duration 


stimulus (Hunter & Jasper, 1949). 
Slower frequency stimulation of these 
nuclei also produced sleep (Hess, 
1954). 

With the thalamic reticular system 
intact, lesions of the midbrain reticu- 
lar formation produced a chronically 
comatose, hypokinetic animal which 
could not be aroused behaviorally. In 
these preparations, the EEG still 
showed an activation pattern to in- 
tense stimuli, but this activation did 
not outlast the period of application 
of the stimulus. This is in contrast to 
animals in which the brain stem re- 
ticular system was intact, but whose 
specific sensory projection paths had 
been transected. These animals gave 
evidence of both behavioral and elec- 
trophysiological arousal over sus- 
tained periods of time, even though 
the specific sensory impulses failed to 
reach the cortex. Lesions of the tha- 
lamic intralaminar nuclei have also 
been reported to produce lethargy, 
somnolence, and motor disability 
(French & Magoun, 1952; French, 
Von Amerongen, & Magoun, 1952; 
Hanberry & Jasper, 1953; Ingram, 
1952; Lindsley, Bowden, & Magoun, 
1949; Lindsley, Schreiner, Knowles, 
& Magoun, 1950). 

These studies illustrate a series of 
highly critical points. First, it is ap- 
parent that the cortical arousal re- 
sponses induced by stimulation of the 
brain stem reticular system are inde- 
pendent of the specific sensory path- 
ways, since they persist after the lat- 
ter have been transected. Second, 
the arrival of specific sensory im- 
pulses in the cortex is not, in the ab- 
sence of nonspecific reticular actiy- 
ity, a sufficient condition for the 
conscious perception of these im- 
pulses. Third, the interconnections 
between the diffuse thalamic nuclei 
and the cortex are not by themselves 
capable of preserving the waking 
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state beyond the immediate period of 
bombardment by afferent impulses 
from the periphery. Maintained 
wakefulness depends on the integrity 
of the brain stem reticular formation, 
since, in its absence, activation will 
not outlast the stimulus. However, 
the fact that even with the brain 
stem reticular formation destroyed, 
activation of the cortex by sensory 
stimuli for brief periods is still possi- 
ble, suggests that these stimuli also 
affect the thalamic reticular system. 

It has now been established that all 
sensory modalities, both interocep- 
tive and exteroceptive, give off col- 
laterals to both the brain stem and 
thalamic reticular systems. Thus, 
visual, auditory, olfactory, tactile, 
pain, proprioceptive, and visceral 
stimuli are all capable of activating 
both components of the reticular for- 
mation (Arduini & Moruzzi, 1953; 
Bremer, 1954; French, Verzeano, & 
Magoun, 1952; French, Verzeano, & 
Magoun, 1953; French, Von Amer- 
ongen, & Magoun, 1952; Morin, 
1953; Starzl & Magoun, 1951; Starzl 
et al., 1951a; Starzl et al., 1951b; 
Zanchetti, Wang, & Moruzzi, 1952). 
Auditory stimuli, for example, feed in 
at many levels from below the infe- 
rior colliculi in the midbrain as far for- 
ward as the posterior thalamus 
(Starzl et al., 1951b). The other sen- 
Sory modes seem to have similar dis- 
persions. It is equally important to 
note that not only do collaterals from 
the specific paths enter the reticular 
formation at several points, but the 
reticular system also influences the 
Specific sensory and motor pathways 
at many levels, either through direct 
collaterals or by affecting internun- 
Cial neurones (Austin & Jasper, 1950; 
Lindsley, 1956; Magoun, 1950). A 
further source of reticular activation 
1s provided by direct projections from 
Certain cortical areas. These will be 


discussed at greater length in a subse- 
quent section of this paper. 

The collaterals from the specific 

sensory paths and the cortical pro- 
jections terminate upon both the 
brain stem reticular and the thalamic 
reticular neurones in a convergent 
‘pattern. It is common to find a 
single reticular unit responding to 
two or three sensory modes. How- 
ever, none of the cells recorded from 
could be fired by all types of stimuli 
(French & Herndndez-Peén, 1955; 
Hernandez-Peén & Hagbarth, 1955; 
Moruzzi, 1954; Scheibel, Scheibel, 
Mollica, & Morruzzi, 1955). Since 
both the latency and pattern of firing 
of a single unit vary for different loci 
of stimulation, the reticular cell is, to 
a certain extent, capable of “know- 
ing” its source of activation, 

The many similarities between the 
brain stem reticular formation and 
the diffuse thalamic nuclei should 
not obscure the differentiations which 
also exist. These will become more 
apparent upon a closer consideration 
of the functional and structural char- 
acteristics of the two systems. One 
of the most striking differences con- 
cerns the arousal response itself. Two 
types of activation patterns have 
now been distinguished (Sharpless & 
Jasper, 1955). The first of these, a 
“tonic” reaction, has been referred to 
the brain stem reticular system. 
This reaction varies in duration from 
a few seconds to many minutes, has 
a long latent period following the 
stimulus, is subject to rapid habitua- 
tion, and tends to recover slowly over 
periods of hours or days. The sec- 
ond, a “phasic” pattern, is presumed 
to be a function of the diffuse thalam- 
ic system. It rarely outlasts the 
stimulus by more than 10 or 15 sec- 
onds, has a short latency, is very re- 
sistant to habituation, and once 
habituated, recovers within a few 
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minutes. The resistance to adapta- 
tion of the phasic response would 
seem to be of special significance. 
By virtue of the fact that the arousal 
mediated by the thalamic nuclei is of 
short duration, it should continue to 
respond to repeated stimuli and thus 
to mediate a more differentiated at- 
tentional state to a stimulus after the 
first gross arousal induced by the 
brain stem reticular formation had 
adapted out. 

A comparison of their functional 
projections, both cortical and caudal, 
illustrates further distinctions be- 
tween the brain stem and thalamic 
reticular systems. The ascending 
brain stem reticular units are gen- 
erally assumed to be diffuse cortical 
activators, while the descending re- 
ticular projections are known to be 
more discrete in their action. Corti- 
cifugal units control the transmission 
of the specific evoked potential at 
many levels of all sensory projection 
paths (Galambos, 1956; Hagbarth & 
Kerr, 1954; Herndndez-Peén, Scher- 
rer, & Jouvet, 1956; Lindsley, 1956) 
and are capable both of facilitating 
and depressing activity in the motor 
pathways (Bernhaut, Gellhorn, & 
Rasmussen, 1953). There is also 
some structural localization within 
the brain stem reticular formation 
with respect to the reception of stim- 
ulating agents. Adrenaline, for ex- 
ample, has been shown to produce 
the cortical arousal response through 
its action upon a specific portion of 
the midbrain tegmentum. With this 
section of the tegmentum destroyed, 
adrenaline no longer had an effect 
(Rothballer, 1956). 

The discreteness of the brain stem 
reticular formation is particularly 
evident in its descending projections; 
the specificity of the thalamic retic- 
ular system, however, seems to be 
directed cephalically. Although the 


diffuse thalamic nuclei activate all 
regions of the cortex, including the 
primary sensory areas (Jasper, 1949; 
Jasper, 1954; Jasper & Ajmone- 
Marson, 1952) there is strong evi- 
dence of regional localization in their 
cortical projections. The medial 
thalamic nuclei project primarily to 
the anterior cortex, while the lateral 
nuclei activate the posterior portion. 
Stimulation of different points in the 
thalamic reticular system produces 
different patterns of activation in the 
cortex (Jasper, 1954; Jasper, Naquet, 
& King, 1955). In addition to their 
role in desynchronizing the cortex at 
high levels of stimulation (i.e., 100 
cycles per second), the diffuse thalam- 
ic nuclei are unique in their capac- 
ity to synchronize cortical rhythms 
at low frequencies of stimulation, 
(i.e., 10 cycles per second). Repeti- 
tive stimulation of the thalamic re- 
ticular areas at frequencies corre- 
sponding to those of the natural al- 
pha rhythms produces a cortical re- 
sponse of increasing amplitude—the 
so-called recruiting response (Demp- 
sey & Morison, 1942; Jasper, 1954; 
Morison & Dempsey, 1942). This ex- 
perimentally produced response, 
which is independent of the specific 
sensory pathways, is assumed by Jas- 
per and his co-workers to involve the 
same neural mechanisms as the nat- 
urally occurring alpha rhythms (Jas- 
per, 1954). If this hypothesis is true, 
then both the recruiting response and 
the alpha rhythm area function of the 
regulatory control exercised by the 
thalamic system upon the cortex. 
The possible roleof the diffuse thalam- 
ic nuclei in timing cortical rhythms 
is particularly relevant to psycholog- 
ical phenomena because of the im- 
portance of the slow alpha-like waves 
as regulators of spike discharge in 
the cortex. Evidence has been pre- 
sented of a fairly high, although not 


{TEE M a 


RETICULAR MECHANISMS AND BEHAVIOR 5 


invariant, correlation between the 
firing of the spike and the phase of 
the cortical slow waves (Gellhorn, 
Koella, & Ballin, 1954; Jasper, 1954; 
Li, Cullen, & Jasper, 1956a; Li, Cul- 
len, & Jasper, 1956b). This relation- 
ship, if valid, would suggest that the 
alpha-like slow waves are able to af- 
fect the discharge of the specific 
evoked potential and thus to influ- 
ence the transmission and elabora- 
tion of stimuli in the cortex (Jasper, 
1954). The possible implications of 
this regulatory function on behav- 
ioral processes such as attention, per- 
ception, and memory will be dis- 
cussed in later sections of this paper. 
This brief review of the properties 
of the reticular systems indicates 
that there are indeed differentiations, 
both structural and functional, be- 
tween its component parts—with 
the brain stem reticular system act- 
ing upon the cortex in a more global 
fashion than the diffuse thalamic 
nuclei. It would seem perfectly ap- 
propriate to equate the arousal func- 
tion of the brain stem reticular for- 
mation with a ‘generalized drive 
state” (Hebb, 1955), since this sys- 
tem does possess the anatomical and 
physiological attributes (i.e., control 
of the level of activation of the organ- 
ism by virtue of its sensitivity to ex- 
teroceptive, interoceptive, hormonal, 
and cortical stimuli) which would 
enable it to fulfill the behavioral re- 
quirements of a drive concept. It is 
also apparent, however, that to cor- 
relate the brain stem reticular forma- 
tion uniquely with “drive” is both to 
limit its conceptual value unneces- 
sarily and to disregard its other func- 
tional characteristics (such as its role 
in the control of sensory input). The 
Operational procedures which psy- 
chologists categorize as “reward” and 
punishment” also serve to activate 
the organism and to narrow the be- 


havioral field. Reward and punish- 
ment, then, would appear to have a 
relation to reticular activity which is 
similar to that of drive. Perhaps 
as psychologists clarify the assump- 
tions underlying their concepts, 
many other supposedly independent 
categories previously regarded as 
mutually-exclusive will be recognized 
as functionally interrelated on the 
basis of a common factor of reticu- 
lar activation. 

Although the attributes of the 
brain stem reticular system qualify it 
as an appropriate neural substrate 
for general behavioral constructs 
such as “drive,” the role of the diffuse 
thalamic nuclei would be obscured 
rather than clarified by such an 
equivalence. Reference has already 
been made to the more flexible opera- 
tion of this system—its ability to reg- 
ulate cortical excitability, the locali- 
zation of its cortical projections, its 
suppressive and facilitatory effects 
upon spike discharge, etc. A system 
such as this, functionally organized 
in a manner which would permit it to 
control the continuum of conscious- 
ness and to serve as a selective mech- 
anism for the facilitation of certain 
perceptions, sensations, and memo- 
ries, as well as the inhibition of 
others, would seem a rich source in- 
deed for the neural mechanisms 
which support highly differentiated 
behavior. 


INTERACTION BETWEEN SPECIFIC 
AND NONSPECIFIC SYSTEMS 


In his 1954 article on “Drive and 
the Conceptual Nervous System,” 
Hebb (1955) proposed a curvilinear 
relationship between drive or arousal, 
defined as the “level of nonspecific 
cortical bombardment through the 
ascending reticular system,” and cue, 
defined in terms of the cortical recep- 
tion and elaboration of the specific 
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sensory evoked potential, as they re- 
late to learning. In defining his axes 
both psychologically and neurophysi- 
ologically, Hebb has done three 
things: he has summarized the psy- 
chological literature indicating a cur- 
vilinear relationship between such 
variables as induced muscle tension 
and the learning of nonsense sylla- 
bles, memory span, etc.; he has pos- 
tulated a correspondence between 
physiological and psychological vari- 
ables which permits psychological 
concepts to be operationalized in 
terms of measurable neurophysio- 
logical variables; and he has focused 
attention upon the interaction of the 
specific and nonspecific systems as 
they affect such processes as learn- 
ing and memory. This attempt to re- 
late nonspecific input to learning 
would seem to be a natural out- 
growth of Hebb’s (1949) earlier 
“dual process.’ theory of memory 
and learning. If a certain period of 
reverberatory neural activity is an 
essential prerequisite to the forma- 
tion of the permanent structural 
trace upon which learning and mem- 
ory depend, then the level of non- 
specific input, as a critical factor de- 
termining the extent and duration of 
the elaboration of the specific evoked 
potential, should indeed bear a law- 
ful relationship to learning and mem- 
ory. 

That the interaction of specific 
and nonspecific activity may be rele- 
vant to perceptual processes, as well 
as learning and memory, is indicated 
by the evidence that the mere arrival 
of the afferent sensory volleys in the 
cortex is not sufficient to insure 
conscious sensation. Under deep an- 
esthesia, which depresses reticular 
activity (Magoun, 1954), the spe- 
cific evoked potentials appear in the 
sensory receiving areas in an en- 
hanced form under conditions which 


would preclude their conscious recep- 
tion (Gellhorn, 1954; Lindsley, 1956). 
It would appear, then, that the clas- 
sical afferent systems transmit the in- 
formation which forms the specific 
content of consciousness, but do not 
per se mediate awareness (Gellhorn, 
1954). Rather, it is activity in the 
nonspecific reticular systems which 
provides the essential neurophysio- 
logical condition for the processes of 
perception, attention, and sensation. 
If this viewpoint is valid, and if one 
is willing to accept the further as- 
sumption that the amplitude of the 
specific evoked potential under nor- 
mal (unanesthetized) conditions pro- 
vides an index of the extent to which 
sensory input is transmitted and 
hence perceived by the organism, 
then varying levels of nonspecific ac- 
tivity should, through their effect 
upon the specific evoked potential, 
be associated with changes in percep- 
tion. This hypothesis is supported 
by a study on humans utilizing both 
recordings of the visual specific 
evoked potential and phenomenal 
report, in which a decreased ampli- 
tude of the evoked potential was 
found to be correlated with a phe- 
nomenal report of decreased inten- 
sity of light (Hernández-Peón & 
Donoso, 1957). Unfortunately, stud- 
ies correlating neurophysiological re- 
cordings and observer’s reports are as 
yet rare in the literature. The data 
to be reviewed in this section are 
therefore primarily neurophysiolog- 
ical in nature and any psychological 
implications derived from them must 
be, to a large extent, based upon 
theoretical assumptions rather than 
direct experimental proof of neuro- 
physiological and behavioral equiva- 
lence. 

Anatomically, the specific and 
nonspecific pathways represent two 
distinct systems which are closely in- 
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terrelated at all levels. The reticular 
formation not only draws collaterals 
from the ascending sensory paths 
throughout its course (French et al., 
1953; Magoun: 1952, 1954; Starzl, 
et al.: 1951a, 1951b), but also feeds 
back into the sensory paths at sev- 
eral levels (Nakao & Koella, 1956). 
In the auditory projection pathways, 
for example, reticular stimulation has 
been shown to affect the size of the 
specific evoked potential at the coch- 
lea (Galambos, 1956), the dorsal 
cochlear nucleus (Herndndez-Peén, 
Jouvet, & Scherrer, 1957), and the 
medial geniculate (Nakao & Koella, 
1956). Within the cortex itself, at 
least two types of convergent organ- 
ization have been found to exist. In 
the sensory cortex, the specific and 
nonspecific afferent fibers terminated 
on different neurones. Cells which 
responded to the stimulation of non- 
specific afferents could not be fired by 
specific afferents, and vice versa. In- 
teraction occurred primarily through 
a series of interneurones which were 
facilitated by the nonspecific volleys 
and in turn affected the excitability 
of the specific elements (Li & Jasper, 
1953). In the visual cortex, some 
cells have been found to respond to 
both specific and nonspecific stim- 
ulation, while others were activated 
only by reticular volleys (Lindsley, 
1956). 

The interaction component con- 
tributed by the classical sensory sys- 
tems is spatially localized as a conse- 
quence of the high degree of topo- 
graphical representation which char- 
acterizes the projection patterns of 
these systems. With regard to the 
nonspecific structures, there is gen- 
eral agreement that the arousal elic- 
ited by the brain stem reticular sys- 
tem is of a diffuse nature (Jasper, 
1954; Magoun, 1954). However, the 

€gree of cortical localization of the 


nonspecific thalamic reticular pro- 
jections has been a matter of con- 
troversy. Starzl et al. (Starzl & Ma- 
goun, 1951; Starzl et al., 1951a; 
Starzl & Whitlock, 1952) failed to 
find any degree of topographical lo- 
calization, while Jasper and his co- 
workers (Hanberry & Jasper, 1953; 
Jasper: 1949, 1954; Jasper & Ajmone- 
Marson, 1952) have repeatedly re- 
ported a definite organization within 
the thalamic nuclei, with the stimula- 
tion of different loci producing vary- 
ing patterns of cortical activation. 
This conflict appears to have been re- 
solved by the Jasper, Naquet, and 
King study (1955) in which it was 
determined that under appropriate 
anesthesia and with’ just-threshold 
intensities of stimulation, discretely 
localized patterns of cortical re- 
sponse could be induced by stimula- 
tion of the diffuse thalamic system. 
A structural basis for these results 
was provided by Chow’s work on re- 
gional degeneration within the tha- 
lamic reticular nucleus as a conse- 
quence of selective cortical ablations, 
which indicated that extirpation of 
each cortical projection area is fol- 
lowed by retrograde degeneration of 
both its specific thalamic relay nu- 
cleus and of a localized adjacent por- 
tion of the nucleus reticularis. Chow 
(1952) concluded, on the basis of his 
findings, that although the reticular 
nucleus taken as a whole may project 
to the entire cortex, ‘‘there is an or- 
derly arrangement of connections be- 
tween different sectors of the reticu- 
lar nucleus and different cortical 
fields.” There exists, then, anatom- 
ical provision for a selectively local- 
ized activation of such areas as the 
striate cortex, temporal lobe, audi- 
tory cortex, sensori-motor cortex, 
etc., by the diffuse thalamic nuclei. 
This viewpoint is also shared by Gas- 
taut (1954), whose extensive EEG 
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studies led him to conclude that in 
addition to specific fibers from the re- 
lay and association nuclei of the thal- 
amus, each cortical region received 
topographically organized nonspecif- 
ic fibers from the intralaminar and 
reticular nuclei of the thalamus. 
Whether the corticopetal fibers uti- 
lized by the thalamic reticular sys- 
tem are also shared by the brain 
stem reticular system, or whether the 
localized and diffuse arousal systems 
are independent of each other in their 
cortical projections remains to be de- 
cided. However, the functional value 
of a more differentiated arousal sys- 
tem capable of localized control of 
the specific projection areas is un- 
questionable. Through selective fa- 
cilitation or inhibition of various sen- 
sory inputs in the cortex, such an 
anatomical arrangement would pro- 
vide discriminative control over the 
elaboration of the specific sensory po- 
tentials at a cortical level and thus 
permit greater flexibility and a more 
finely graded regulation of processes 
involving selective awareness, per- 
ception, and memory than is possible 
through peripheral sensory control 
alone. 

Neurophysiological data on inter- 
action may be divided into three 
categories: (a) interaction between 
the two nonspecific systems; (b) in- 
teraction between the specific sen- 
sory systems in the reticular forma- 
tion; (c) interaction between the spe- 
cific and nonspecific systems. 

In studies of the relations between 
the two nonspecific systems, the 
arousal response initiated by the 
brain stem reticular formation has 
been shown to block the cortical re- 
cruiting response evoked by the dif- 
fuse thalamic nuclei (Gauthier, Par- 
ma, & Zanchetti, 1956; Gellhorn, et 
al., 1954; Jasper et al., 1955). Wheth- 
er this was due to a direct desyn- 


chronization of the thalamic nuclei 
by the brain stem reticular forma- 
tion, to a prepotent effect upon the 
cortical neurones by the brain stem 
system, or merely to an inability to 
distinguish the two effects electro- 
graphically is not certain (Morrell & 
Jasper, 1956). If the blocking effect 
is a true one, it would seem to suggest 
that gross activation or arousal is 
inimical to the optimal functioning 
of the regulatory effects mediated by 
the thalamic reticular system. This 
overshadowing of the more differ- 
entiated functions of the thalamic 
nuclei by the diffuse arousal response 
of the brain stem may have its be- 
havioral counterpart in the many 
failures of discrimination which occur 
under high emotion and excitement. 

Both facilitatory and inhibitory in- 
teractions among the ascending sen- 
sory systems, as well as between the 
ascending systems and the cortici- 
fugal projections, have been demon- 
strated in the reticular formation it- 
self. Simultaneous convergence of a 
peripheral sensory impulse and a cor- 
ticifugal potential upon a reticular 
unit led to facilitation of the reticular 
response, while a reticular neurone 
which had been fired by either a pre- 
ceding corticifugal or a peripheral 
sensory volley failed to respond to a 
subsequent sensory stimulus. It was 
noted that the depression of reticular 
response following repetitive cortical 
or afferent stimuli was particularly 
severe. Because of the density of re- 
ticular neurones, facilitatory field ef- 
fects occurred in neurones near those 
which were directly fired. These 
subliminal fringe areas then gave rise 
to summated potentials when fired 
by subsequent test stimuli (Amas- 
sian & Devito, 1954; Hernández- 
Peón & Hagbarth, 1955; Moruzzi, 
1954; Murphy & Gellhorn, 1945). 
The large degree of convergence of 
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afferent and cortical projections upon 
reticular units, and the suppressive 
and facilitatory effects to which this 
competition gives rise, produces an 
area of sensory and cortical interac- 
tion to which simple input-output 
conceptions of nerve impulse trans- 
mission are not applicable. Reticular 
units may exhibit input without out- 
put (due to suppressive and occlusive 
effects), as well as output without in- 
put (due to a summation of spon- 
taneous activity), and thus may 
serve both integrative and pace- 
making functions (Fessard, 1954). 
The full behavioral consequences of 
these properties, particularly with re- 
gard to levels of awareness and sen- 
sory interaction, remain to be inves- 
tigated. 

Interaction between specific and 
nonspecific systems within the cor- 
tex has also produced varied and 
complex phenomena depending upon 
the timing, intensity, and loci of 
stimulation. The generalized cortical 
activation produced by a single shock 
delivered to a diffuse thalamic nu- 
cleus had a facilitatory effect upon 
the responsiveness of the areas in- 
volved to subsequent specific sensory 
volleys (Li & Jasper, 1953). How- 
ever, with rapid and intense stimula- 
tion of the thalamic reticular system, 
both the secondary surface-negative 
wave of the specific evoked potential 
(which is dependent upon the activa- 
tion of the cortical apical dendrites) 
and the repetitive after-discharges 
(which are transmitted via thalamo- 
cortical reverberatory circuits) were 
abolished (Jasper: 1949, 1954; Jasper 
& Ajmone-Marson, 1952). Thus, the 
diffuse thalamic nuclei seemed cap- 
able both of facilitating the reception 
of the specific sensory impulses in the 
Cortex, as indicated by the increase in 
the number of spike discharges to a 
Stimulus, and of suppressing the 


elaboration of these afferent impulses 
through cortical and thalamo-cortical 
circuits. 

Varied cortical effects have also 
been induced by brain stem reticular 
stimulation. On the one hand, the 
primary evoked potentials elicited by 
excitation of a peripheral nerve were 
reduced or blocked in the cortex by 
intense reticular arousal (Gauthier, 
et al., 1956). On the other, hypo- 
thalamic stimulation interacted with 
the specific sensory impulse to give 
both an intensification of the ampli- 
tude of the cortical evoked response 
and an increase in the area from 
which it was recorded (Gellhorn, et 
al., 1954). This interaction of hypo- 
thalamic stimulation with a specific 
sensory stimulus occurred chiefly 
within the corresponding projection 
area (i.e., hypothalamic interaction 
with an acoustic stimulus affected 
the auditory projection area), but, to 
a lesser extent, it increased the re- 
sponse of another region (i.e., the vis- 
ual area) to stimulation. The many 
apparent discrepancies in results 
from stimulation of the brain stem 
reticular formation may well be due 
to differences in the levels of anes- 
thesia and intensities of stimulation 
utilized, for both are critical factors 
in inducing variability of response. 
To cite an example, the generalized 
reticular activation induced by mild 
nociceptive stimuli enhanced the size 
of the primary auditory and vis- 
ual potentials under light anesthesia. 
When depth of anesthesia was in- 
creased, a depression of these re- 
sponses occurred (Bernhaut, et al., 
1953). 

The extensive research of Gellhorn 
has been instrumental in clarifying 
many of the complexities associated 
with the interaction of specific and 
nonspecific impulses (Bernhaut, et 
al., 1953; Gellhorn: 1952, 1954; Gell- 
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horn, et al., 1954; Nakao & Koella, 
1956). A key finding has been the 
ordering of the various sensory modes 
with respect to their effectiveness in 
inducing cortical activation (Bern- 
haut, et al., 1953). Nociception and 
proprioception have been found to 
induce the most intense and wide- 
spread cortical activation, with audi- 
tory and visual stimuli producing the 
least. Gellhorn (Bernhaut, et al., 
1953) also reported two kinds of cor- 
tical activation patterns in response 
to stimulation. The first, a general- 
ized arousal throughout all areas of 
the cortex, which was accompanied 
by excitation in the hypothalamic 
portion of the brain stem reticular 
system, occurred mainly to nocicep- 
tive and proprioceptive stimuli. The 
second, a specific activation pattern, 
with excitation confined to the spe- 
cific sensory projection area, was not 
accompanied by hypothalamic exci- 
tation in most instances. It was given 
predominantly by visual and audi- 
tory stimuli. The comparative order 
of effectiveness of the sensory modes 
in their ability to excite the reticular 
formation is particularly interesting 
when considered in relation to evi- 
dence concerning the relative ease of 
establishing both electrographic and 
behavioral conditioned responses to 
auditory as opposed to visual stimuli. 
Chow, et al. (1957), for example, have 
reported that an avoidance response 
which was established in 450 trials to 
light as the CS required only 150 
trials for tone. Morrell and Jasper 
(1956) gave a similar order of diffi- 
culty for conditioned alpha flicker re- 
sponses: the mean for visual CR’s was 
13.2 trials; for auditory, 8.2. A par- 
tial explanation of these results may 
lie in the finding that auditory stimuli 
are, in general, more effective activa- 
tors of the reticular formation and 
hence of the cortical projection and 


elaboration areas than are visual 
stimuli (Bernhaut, et al., 1953). If 
the formation of the memory trace is 
dependent upon a certain level of non- 
specific input, then a class of stimuli 
with a strong inherent capacity for 
providing its specific sensory com- 
ponents with a high level of reticular 
activation would be functionally 
equipped to produce faster learning 
than a class of stimuli dependent up- 
on random external sources for its 
elaboration. 

Direct experimental evidence of 
the effect of interaction between the 
specific and nonspecific systems on 
the learning process is sparse. Chiles 
(1954) has reported that stimulation 
of the diffuse thalamic nuclei in- 
creased variability in a discrimina- 
tion task, and Gengerelli and Cullen 
(1955) have presented some evidence 
of increased learning as a result of 
stimulation of presumably cortical 
structures. The most impressive 
work to date has been that of Mahut 
(1957). Rats were run in the Hebb- 
Williams maze under hunger motiva- 
tion for 10 trials a day. Immediately 
following each trial, they were stim- 
ulated for 15 seconds while eating in 
the goal box with .25 volt, 60 cycle 
current, delivered through bipolar 
electrodes in the intralaminar and 
midline thalamic nuclei. No visible 
disruption of feeding behavior oc- 
curred with this procedure. How- 
ever, there was a highly significant 
impairment in learning for the experi- 
mental rats as compared to control 
animals. Additional controls indi- 
cated that there was no difference in 
the trial latencies for the two groups, 
and that stimulation itself carried 
neither pleasurable nor aversive qual- 
ities since the experimental animals 
gave only spontaneous bar pressing 
rates when tested for self-stimulation 
in the Skinner Box. Mahut has inter- 
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preted her results as indicating an in- 
terference with the neural circuits in- 
volved in the memory trace. 

The relatively small currents which 
were utilized in Mahut’s study, and 
the striking decrements which were 
produced by stimulation immediately 
following a trial suggest the possi- 
bility that massive peripheral stim- 
uli capable of activating the reticular 
system to a similar extent might also 
produce decrements in memory if ap- 
plied during the critical time inter- 
val. The massive nonspecific input 
associated with severe punishment, 
for example, may affect learning by 
disrupting the consolidation process 
of the preceding response. In a mul- 
tiple choice situation with punish- 
ment for errors, this would have the 
effect of disrupting the neural trace 
of the immediately antecedent in- 
Correct response, while permitting 
the consolidation of the alternative 
traces to proceed comparatively un- 
affected. This schema would also ac- 
count for the relative efficiency of 
spaced as opposed to massed learning 
under punishment. If the neural 
trace is most vulnerable to interfer- 
ence during a certain limited critical 
interval, then massing trials would 
increase the probability of exposure 
of the trace associated with the non- 
punished response to the disruptive 
effects of high levels of nonspecific 
input. These speculations are given 
Some support in a study by Duncan 
(1949), who administered traumatic 
shocks of 85 volts to the hind legs of 
tats at intervals of 20 seconds, 60 
Seconds, 4 minutes, and 45 minutes 
after one trial per day in an avoid- 
ance box. At the end of 18 days, the 
20-second group showed a significant 
Impairment in learning which was of 
a magnitude similar to that of an ex- 
Perimental group given electrocon- 
vulsive shock of equal voltage. The 


other three groups showed no signifi- 
cant decrement. 

The precise perceptual correlates 
of interaction also remain. ambigu- 
ous. A possible relationship between 
levels of reticular activation and per- 
ceived brightness is suggested by evi- 
dence that stimulation of the mid- 
brain reticular formation led to a 
great facilitation of the response of 
individual retinal units to a test flash 
of light (Granit, 1955). This facili- 
tation involved both an increase in 
impulse frequency and an extended 
duration of discharge. Since impulse 
frequency is directly related to stim- 
ulus intensity, which is correlated 
with perceived brightness, level of 
reticular activation may be one of the 
determinants of perceived brightness. 
A study by Fuster (1958) lends some 
support to this hypothesis. Monkeys 
stimulated in the midbrain reticular 
formation made a higher percentage 
of correct responses and showed 
shorter reaction times in a discrimina- 
tion problem involving the presenta- 
tion of stereometric objects at expo- 
sures ranging from 10-40 milliseconds 
than did the control group. However, 
the failure to control for pupillary re- 
sponse, the absence of recordings of 
the specific evoked potential in the 
specific sensory tracts and cortical 
receiving areas, and the application 
of the electrical stimulus during the 
response interval, make it impossible 
to decide whether the facilitation is a 
function of the peripheral receptor, 
the central organization of the per- 
ception, or the motor response. t 

Sensory deprivation and photic 
driving studies have also provided 
rich data for conjecture. Heron, 
Doane, and Scott (1956) noted that 
their eight Ss who reported hallucina- 
tions during isolation showed EEG's 
containing slow frequency delta waves 
both during and after the isolation 
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period. Hallucinatory activity as a 
consequence of photic driving has 
also been a frequent occurrence, and, 
in one case, at least, these vivid vis- 
ual images were accompanied by 
high voltage, irregular slow waves at 
a frequency from 4-8 cycles per sec- 
ond in the temporal and temporo- 
occipital area (Mundy-Castle, 1953). 
These studies seem to indicate a rela- 
tionship between low level nonspecific 
input, either as a consequence of lack 
of sensory input or from synchronous 
“driving” stimuli, and hallucinatory 
activity. Whether the hallucinatory 
activity which is coincident to so 
many psychotic states is also a result 
of low level, relatively synchronized 
nonspecific activity, perhaps induced 
by centrally initiated sensory cutoff, 
remains a problem for further inves- 
tigation. 

The studies reviewed in this sec- 
tion provide strong neurophysiologi- 
cal evidence of interaction between 
the specific sensory and nonspecific 
reticular systems. There can be little 
doubt that the size and frequency of 
the specific evoked potential are af- 
fected by activity in the nonspecific 
activating systems. Despite the 
stimulating speculations of Hebb 
(1949; 1955) and Gellhorn (1952; 
1954), however, the actual effects of 
this interaction upon behavioral phe- 
nomena such as attention, percep- 
tion, memory, etc., remain essentially 
unknown and constitute a challenge 
to the ingenuity and perseverance of 
psychologists. 


CENTRAL CONTROL OF AFFERENT 
INPUT 


The restrictive and selective nature 
of attentional processes has long been 
recognized by psychologists and psy- 
choanalysts alike. Proponents of 
nonreinforcement theories of learn- 
ing, for example, have attempted to 
explain the failure of “latent” learn- 


ing under conditions of strong drive 
by arguing that the irrelevant incen- 
tive was not perceived under high 
motivation (Thistlethwaite, 1951), 
while psychoanalytic theorists have 
proposed the existence of a “stimulus 
barrier,” which permits the organism 
to protect itself against traumatic 
stimuli by shutting off the function 
of perception (Fenichel, 1945). The 
operation of these mechanisms has 
been generally assumed to occur 
either prior to the impingement of the 
stimulus upon the peripheral re- 
ceptor (i.e., the animal did not “look 
at? or “see” the relevant discrim- 
inanda), or subsequent to its arrival 
in the higher centers of the brain 
(i.e., the organism did not “pay at- 
tention” to what it “saw”). In other 
words, information transmitted by 
the specific pathways was thought to 
remain constant throughout its course 
subsequent to its reception at the 
peripheral sense organ and prior to 
its elaboration in the cortex. That 
nondecremental transmission of in- 
put is far from universal has been 
consistently demonstrated by recent 
work on the centrifugal regulation of 
afferent influx, which indicates that 
sensory impulses may be regulated 
and controlled at every level from the 
receptor upward. 

Central control of afferent impulses 
at a higher nervous system level was 
first demonstrated by Granit and 
Kaada (1952) with respect to the 
muscle spindle—a proprioceptive re- 
ceptor. Both facilitation and inhibi- 
tion of discharge were obtained by 
stimulation of gamma efferent fibers 
through the brain stem reticular for- 
mation (Granit & Henatsch, 1956). 
Facilitatory and inhibitory effects 
have also been reported in retinal 
ganglion cells and in the optic tract 
as a result of reticular stimulation 
(Dodt, 1956; Granit: 1955a, 1955b). 
In the spinal cord, stimulation of the 
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bulbar and midbrain reticular forma- 
tion, the anterior cingulate gyrus, and 
the sensori-motor cortex depressed 
or inhibited conduction in both the 
dorsal and ventral columns, which in- 
clude fibers mediating both kines- 
thetic and cutaneous pressure (Hag- 
barth & Kerr, 1954). Tactileimpulses 
were also depressed for periods as long 
as 80 seconds at the level of the gracile 
nucleus of the medulla by stimula- 
tion of reticular areas and the sensori- 
motor cortex (Scherrer & Hernández- 
Peón, 1955). That these effects may 
be obtained from the sensori-motor 
area as well as the anterior cingulate 
gyrus indicates that cortical projec- 
tions to the reticular formation are 
capable of initiating the regulation of 
specific evoked potentials, as well as 
of blocking conduction of nonspecific 
impulses within the reticular forma- 
tion itself (Adey, Segundo, & Living- 
ston, 1957). Transmission in the 


olfactory and auditory pathways was 


of the amygdala, the prepyriform 
cortex, and the anterior commissure 
exerted a depressive influence on elec- 
trical activity in the olfactory bulb 
(Kerr & Hagbarth, 1955), and stimu- 
lation of the bulbar reticular forma- 
mation suppressed the response of 
the cochlea to auditory stimuli (Ga- 
lambos, 1956). It would appear, 
then, that all sense modalities have 
some means of centrifugal control, 
either at the level of the receptor it- 
self, at the first or second synapses, 
or at some more centrally located sta- 
tion along the afferent pathways 
(Lindsley, 1956), 

That these suppressive and facili- 
tatory effects are not merely artifacts 
induced by unphysiological electrical 
Stimulation has now been shown by 
experiments utilizing natural stimuli 
and unanesthetized animals with 
chronically implanted electrodes in 


the sensory projection paths. Her- 
ndndez-Peén, Scherrer, and Jouvet 
(1956), recording the responses of the 
dorsal cochlear nucleus to auditory 
clicks, reported that these specific 
evoked potentials were practically 
abolished when a visual stimulus— 
two mice in a closed bottle—elicited 
behavioral evidence of attention from 
the cat. When the mice were re- 
moved, the auditory responses re- 
turned to the same order of magni- 
tude as the original control responses. 
Similarly, olfactory stimuli and a 
nociceptive shock, which apparently 
distracted the animal's attention, re- 
sulted in a reduction of auditory 
evoked potentials in the cochlear 
nucleus. If it is valid to assume that 
subjective awareness of a stimulus is 
contingent upon the transmission of 
its concomitant evoked potential 
through the specific projection path- 
ways to higher diencephalic and cor- 
tical centers, then itis possible to con- 
clude that the cat’s “hearing” of the 
click was disturbed when it was dis- 
tracted by other stimuli. Photically 
evoked potentials were also reduced 
or abolished when the animal focused 
on an acoustic or olfactory stimulus 
(Hernandez-Peén, Guzmin-Flores, 
Alcaraz, Ferndndez-Guardiola, 1957). 
This reduction in the magnitude of 
the visual evoked potential occurred 
both within the specific sensory path- 
ways (i.e., the optic tract, lateral ge- 
niculate body, and striate cortex) and 
in the midbrain reticular formation. 
Since it occurred at a level peripheral 
to the optic tract, the blocking effect 
was assumed to take place in the 
retina. A similar blockade of photic 
potentials was observed when atten- 
tive behavior was elicited by stimu- 
lating the brain stem reticular forma- 
tion. 

A further correlation between af- 
ferent signals and conscious sensation 
has been reported by Hernández- 
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Peón and Donosa (1957) with human 
Ss. Four patients with electrodes 
chronically implanted in the occipital 
lobes were subjected to a series of 
flashes of constant intensity. In gen- 
eral, the magnitude of the evoked po- 
tentials which were recorded varied 
with the reported perception of the 
intensity of the light. When the pa- 
tients’ attention was engaged by the 
presentation of such mental tasks as 
arithmetic problems or instructions 
to recall visual imagery, the visual 
evoked potential was reduced or 
abolished. The potentials recovered 
their original size after the comple- 
tion of the task. 

Analysis of the data on regulation 
of sensory input (including that re- 
viewed in the previous section on in- 
teraction of specific and nonspecific 
impulses) indicates that there are 
mechanisms available for three types 
of control of input: 

1. At the level of the sensory re- 
ceptor, the spinal cord, and in the 
specific sensory paths prior to the 
point at which they give off collat- 
erals to the reticular formation, both 
the arousal and the cue effects of 
stimuli may be controlled. 

2. In the reticular formation itself, 
the arousal effects of stimuli may be 
enhanced or inhibited. 

3. In the cortex, the cue value of 
stimuli may be affected. 

These mechanisms of sensory con- 
trol provide a neurophysiological 
basis for phenomena such as the re- 
pressive defences, concentration, hys- 
terical anesthesias, etc., which in- 
volve a selective restriction of sensory 
input in their operation. They also 
indicate that the interpretation of 
cortical events must be undertaken 
with caution in the absence of record- 
ings of afferent influx to the cortex. 

This section has dealt with central 
control of afferent input at a reticular 


level. However, the complex discrim- 
inative behavior of interest to psy- 
chologists is generally assumed to in- 
volve the cortex. It is highly rele- 
vant, therefore, to inquire to what ex- 
tent cortical areas may exert an ef- 
fect upon the reticular formation and 
its regulatory mechanisms and thus 
participate in the control of periph- 
eral sensory input. The following 
section will review the existing litera- 
ture which bears on this question. 


CORTICAL PROJECTIONS TO THE 
RETICULAR SYSTEM 


The tendency of many behavior 
theories to base their motivational 
constructs upon the so-called “pri- 
mary” biological needs (Hull, 1943), 
to the exclusion of autonomous cog- 
nitive processes, has been subjected 
to increasing criticism of late. This 
revival of interest in cognitive moti- 
vation by psychologists has found 
ample support from neurophysiology, 
where investigations of cortical pro- 
jections to the reticular formation 
strikingly demonstrate the fallacy of 
theories which categorize higher men- 
tal functions as subordinate deriva- 
tives of more “basic” need states. 

Within the past few years, the im- 
portance of cortical projections to the 
brain stem reticular formation and 
the diffuse thalamic nuclei has been 
repeatedly confirmed by studies 
which indicate that the potentials in- 
duced throughout the reticular sys- 
tems by cortical projections are 
larger, more widespread, and have a 
shorter latency than those evoked 
directly by any sensory mode (Her- 
nandez-Peén & Hagbarth, 1955). 
These cortical connections to the 
reticular formation arise only in cer- 
tain limited regions: the frontal ocu- 
lomotor area, the orbital surface of 
the frontal lobe, the sensori-motor 
cortex, the superior temporal gyrus, 
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the cingulate gyrus, and the hippo- 
campal gyrus (French & Hernández- 
Peón, 1955; French, Livingston, & 
Hernandez-Peén, 1953; Jasper, et al., 
1952; Livingston, French, & Hernán- 
dez-Peén, 1953; Segundo, Naquet, & 
Buser, 1955). Stimulation of the last 
three areas has resulted in the most 
intense and widespread arousal of all 
points explored (Segundo, Naquet, & 
Buser, 1954), and it is particularly 
interesting to note that these struc- 
tures are currently believed to be 
critical to the elaboration of emo- 
tional and memory processes (Jasper, 
Gloor, & Milner, 1956; Lindsley, 
1956). 

The importance of these cortical 
connections can scarcely be overem- 
phasized, for they provide a means 
whereby the cortex can control the 
activating mechanisms of the brain 
stem and thus influence its own level 
of arousal (French & Hernández- 
Peón, 1955). This effect is particu- 
larly relevant to sleep phenomena 
(Bremer, 1954) and to the role of 
learned stimuli in directing behavior. 
The efficacy of cortical processes in 
inducing wakefulness has been con- 
firmed by studies in which threshold 
electrical stimulation of areas with 
Projections to the reticular formation 
aroused a sleeping animal and pro- 
duced cortical desynchronization just 
as effectively as an intense peripheral 
sensory stimulus (Segundo, Arana, & 
French, 1955). Of even greater im- 
Port to behavior is the role of cortical 
Projections in providing a mediating 
mechanism whereby learned, mean- 
ingful stimuli may influence the or- 
ganism’s activity in the waking state. 
That this influence is a powerful one 
Is evident even in the behavior of 
relatively “ungifted” animals. Thus, 
the appearance of a human being may 
Come to elicit a far more consistent 
and intense arousal from the rabbit 


than strong sensory stimuli such as 
loud noises and bright lights (Gan- 
gloff & Monnier, 1956). 

Although corticifugal fibers to the 
reticular formation originate in dis- 
crete cortical areas, they terminate in 
overlapping projection areas within 
the reticular system (Amassian & 


Devito, 1954; Herndndez-Peéin & 
Hagbarth, 1955; Moruzzi, 1954; 
Scheibel, et al., 1955). There is, 


therefore, an extensive degree of con- 
vergence of both cortical and afferent 
impulses upon individual reticular 
units. However, single unit analysis 
of reticular neurones has indicated 
that individual cells respond with dif- 
ferent patterns and latencies of firing, 
depending upon the source of stimu- 
lation (Amassian & Devito, 1954; 
Hernandez-Peén & Hagbarth, 1955). 
Information may also be conveyed 
by differential patterns of inactive as 
well as active units within the system 
as a whole (Adey, et al., 1957; French 
& Herndndez-Peén, 1955; Hernán- 
dez-Peén & Hagbarth, 1955). These 
findings lend substance to the con- 
clusion that “the identification of 
unique factors corresponding to each 
corticifugal path impels one to leave 
open the possibility that the tem- 
poral configuration of activity may 
provide a code for specificity of in- 
formation conveyance even to a dif- 
fusely projecting system” (Adey, et 
al., 1957). 

Comparison of transmission laten- 
cies in the specific and nonspecific 
systems indicates that impulse veloc- 
ities are faster in the specific sensory 
pathways than in the reticular for- 
mation. For example, the latency of 
the specific evoked potential in the 
sensori-motor cortex following stimu- 
lation of the sciatic nerve was 9-10 
msec. In the midbrain reticular for- 
mation, the conduction times ranged 
from 13-23 msec. (French, et al, 
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1953). This difference of 4-14 msec., 
when compared to the 6-12 msec. la- 
tencies of potentials from cortical 
areas to the reticular formation 
(French & Hernéndez-Peén, 1955), 
suggests that there is time for a 
stimulus to reach the cortex via the 
specific paths and then relay down to 
the reticular formation in time to af- 
fect its own arousal properties. Ex- 
perimental data in support of this 
supposition have been presented by 
Ingvar and Hunter (1955), who re- 
corded brain stem responses to optic 
stimuli in intact cats and in animals 
with chronic bilateral ablations of the 
visual cortex. In the ablated prep- 
arations, it was possible to trace a 
central diencephalic pathway for 
light responses which coincided with 
the thalamic reticular system. The 
mean latency of these responses was 
35 msec., with a range from 28-45 
msec., indicating a slow conduction 
velocity. Although intact prepara- 
tions also showed responses to visual 
stimulation in the diencephalon, no 
distinct central pathway was found 
within the thalamic reticular system 
and the latencies of these brain stem 
potentials were generally in the range 
of 20 to 30 msec. The authors have 
interpreted these shorter latency 
brain responses in the intact animals 
as due to corticifugal influences from 
the visual cortex on the brain stem. 
From a study of the time relations 
involved, they believed it to be possi- 
ble for impulses from the occipital 
areas to influence brain stem poten- 
tials initiated by direct optic collat- 
erals at the pretectal and collicular 
levels, and for the cortex thus to con- 
trol events elicited by visual stimuli 
in the nonspecific pathways of the 
brain stem. 

The potential extent of this con- 
trol has been strikingly demon- 
strated by Adey, Segundo, and Liv- 


ingston (1957), who reported that 
stimulation of the cortical areas pre- 
viously listed (the hippocampal gyrus 
and the temporal gyrus being the 
most effective of these) blocked 
conduction in the reticular formation 
between the midbrain and the thala- 
mus. In the region of the hippo- 
campal gyrus, for example, single 
cortical shocks induced profound 
blocking interaction in the reticular 
formation lasting for two seconds. 
These two studies should be of. 
particular interest to psychologists, 
for not only do they provide a neuro- 
physiological basis for phenomena 
involving aberrations of conscious- 
ness and memory under emotional 
stress, but they strongly indicate the 
critical role of the cortex, with its 
highly discriminative properties, in 
the selection and transmission of 
sensory input. If the perception of 
complex stimuli requires extensive 
supportive elaboration from nonspe- 
cific sources in order to be retained 
as conscious memory, and if reticular 
input can be blocked during this pe- 
riod of consolidation by a discrimina- 
tive center which is capable of moni- 
toring its own input, then theoretical 
constructs such as ‘“‘subception,” 
“perceptual defense,” and ‘“repres- 
sion” may have more validity than 
their critics have yet been prepared 
to admit. Repression, for example, 
has been conceptualized in terms of 
two components—a withdrawal or 
expulsion from consciousness of the 
ideational representation of the dan- 
gerous impulse, and a “warding off” of 
any external stimulus which, by asso- 
ciation with the repressed thought, 
would restore it to consciousness 
(Fenichel, 1945), Neurophysiologi- 
cally, it may well be possible to dis- 
turb ideation by a blocking of reticu- 
lar conduction which would produce 
a transitory change in the level of 
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consciousness, while the reception 
and transmission of external sensory 
stimuli could be disrupted by reticu- 
lar control of peripheral afferent in- 
put within the specific sensory paths. 
These suppositions are highly specu- 
lative, but do serve to illustrate some 
possible applications of reticular 
mechanisms to behavioral phenom- 
ena. 


Tue RETICULAR SYSTEM AND THE 
LEARNING PROCESS 


Although isomorphism between be- 
havior and brain processes is not an 
essential condition of their interrelat- 
edness, evidence of the same varia- 
bility and plasticity which charac- 
terize behavior has been sought by 
theorists in the neural systems which 
supposedly underlie this behavior. It 
is of interest then to inquire: (a) to 
what extent the reticular formation is 
involved in the learning process; (b) 
to what degree its activity is capable 
of modification; (c) whether changes 
in reticular activity are initiated 
within the reticular system itself, or 
whether they are controlled by higher 
centers. 

Many of the studies seeking to in- 
vestigate the relationship of brain 
activity to learning have utilized con- 
ditioning of the alpha rhythm as 
their operational technique. The pro- 
cedure for alpha conditioning in- 
volves the pairing of a conditioned 
stimulus, either auditory, visual, or 
tactile, with the unconditioned stim- 
ulus of a flickering light. The uncon- 
ditioned response is a blocking or de- 
synchronization of the alpha rhythm 
to high frequencies of stimulation, or 
a photic driving at the frequency of 
the flashing light for stimuli between 
six and twelve cycles per second. 
Morrell and Jasper (1956) have found 
that following a period of adaptation 
to the CS, at which time its presenta- 


tion no longer evoked recordable 
electrical response, conditioning to 
the paired CS and US occurred in 
three stages: an initial generalized 
blocking or desynchronization to the 
CS which appeared simultaneously 
throughout the cortex; an interven- 
ing phase of localized discharge 
which was frequency-specific to the 
unconditioned stimulus; and a final 
stage of desynchronization which 
was mainly localized in the occipital 
cortex, and, to a lesser extent, in the 
surrounding parietal and posterior 
temporal areas. These changes were 
specific to the particular conditioned 
stimulus utilized and, once estab- 
lished, did not generalize either 
within or among sensory modes. 
Electrographic conditioning ex- 
periments have presented evidence of 
highly consistent and characteristic 
changes in alpha following the pres- 
entation of a CS. The methodologi- 
cal similarity of this type of condi- 
tioning to behavioral learning is not, 
however, sufficient basis for assum- 
ing that conditioned flicker dis- 
charges are the neural equivalents of 
behavioral responses. In a procedure 
involving two training stages, Chow, 
Dement, and John (1957) first condi- 
tioned three adult cats to perform an 
avoidance response in a double grill 
box with flickering light as the CS 
and an electric shock as the US. The 
CS evoked photic driving in the 
electrocorticogram and the US forced 
the cats to cross over to another com- 
partment in the box. After repeated 
paired presentations, the flicker by 
itself elicited both the ECG repeti- 
tive discharge and the behavioral 
crossing. The cats were then trained 
to a conditioned ECG response 1n an 
animal holder, with a tone as CS and 
the flickering light as US. After the 
cats acquired both these CRs, they 
were returned to the double grid box 
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to test whether the conditioned ECG 
discharge would be associated with 
the behavioral response. Tone alone 
was presented: it evoked the ECG 
change but did not lead to behavioral 
crossing. The authors concluded 
that “Establishment of partial equiv- 
alence between two different stimuli 
via cortical conditioning, using more 
or less similar electrical responses to 
both stimuli as the indicator, is not 
sufficient to establish overt behav- 
ioral equivalence between these stim- 
uli, using the manifestation of the 
conditioned avoidance response as 
the behavioral criterion.” 

Despite an apparent lack of equiv- 
alence between the electrical activity 
of the brain, as recorded by gross 
EEG methods, and behavioral re- 
sponses, electrographic conditioning 
does provide a valuable tracer mech- 
anism for following and analyzing the 
functional changes in brain activity 
which accompany the learning proc- 
ess. Utilizing recordings of condi- 
tioned electrical activity from corti- 
cal and subcortical structures, Yoshii, 
Pruvot, and Gastaut (1957) have 
presented evidence that the repeti- 
tive discharges at the frequency of 
the CS were earlier in onset, higher 
in amplitude, and more stable in oc- 
currence in subcortical structures, 
particularly in the midbrain reticular 
formation, than in the occipital cor- 
tex. As conditioning proceeded, the 
cortex became progressively more 
synchronized with the reticular for- 
mation until the relationship be- 
tween them almost reached identity. 
The critical role of the midbrain 
reticular formation is also sup- 
ported by the results of Hernández- 
Peón, Brust-Carmona, Eckhaus, 
Lopez-Mendoza, and Alcocer-Cuaron 
(1956), who established a conditioned 
salivation to visual and tactile stimuli, 
and then made restricted lesions in a 

number of subcortical structures, in- 


cluding the midbrain reticular forma- 
tion, septal area, medial thalamic 
nuclei, superior colliculi, etc. Only 
the lesions in the midbrain reticular 
formation, which never resulted in 
coma, abolished or reduced the con- 
ditioned salivary response in the 
awake animal. Since unconditioned 
salivation was unaffected or even en- 
hanced after lesion, the authors con- 
cluded that “learning seems to re- 
quire the functional integrity of the 
brain stem reticular system.” 
Habituation studies on both the 
arousal response and the specific 
evoked potential have also proved 
fruitful in analyzing the role of the 
reticular system in learning. Record- 
ing from naturally sleeping, unan- 
esthetized cats with permanently im- 
planted electrodes in cortical and 
subcortical structures, Sharpless and 
Jasper (1956) found that habituation 
of the arousal response to simple 
tones occurred rapidly in intact ani- 
mals. The habituation was fre- 
quency-specific, although it also 
showed some degree of generaliza- 
tion. Many of the intact animals also 
gave an habituation response which 
was specific to the particular pattern 
of tone utilized, although changes in 
pattern were less effective in produc- 
ing arousal after adaptation than 
changes in simple tones. Selective 
lesions within the specific sensory 
paths produced differential effects 
upon adaptation. Removal of the 
cortex abolished pattern-specific ha- 
bituation; while transection below 
the geniculate bodies destroyed both 
pattern and frequency-specific ha- 
bituation. The habituation of the 
arousal response was found to be in- 
dependent of changes in the primary 
sensory pathways, since the specific 
evoked potentials could still be ob- 
tained from the cortical projection 
areas after the stimulus had lost its 
power to elicit generalized activation. 
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Sharpless and Jasper concluded that 
habituation of arousal occurs within 
“the brain stem reticular and the un- 
specific thalamic systems with their 
associated collateral pathways.” 
Whether habituation is an autono- 
mously initiated activity of the retic- 
ular system in response to informa- 
tion transmitted directly by the 
classical sensory paths, or whether it 
is controlled by higher centers is not 
known at this time. The authors did 
present evidence that adaptation re- 
mained frequency-specific despite 
cortical removal for cats with collat- 
erals from intact medial geniculate 
bodies into the diffuse thalamic nu- 
clei. In contrast, adaptation was spe- 
cific only to intensity and sensory 
mode for those animals who, due to 
transection of the specific sensory 
pathways above the colliculi, pre- 
sumably retained functional collat- 
crals solely to the brain stem reticu- 
lar formation. Whether this differ- 
ence in specificity between the thala- 
mic and brain stem reticular systems 
represents a real distinction between 
the discriminative properties of the 
systems, or merely reflects variation 
in the amount of information trans- 
mitted by the specific sensory collat- 
erals at the two levels is unknown. 
Nevertheless, it is tempting to specu- 
late upon the possible implications of 
these results for the concept of gen- 
eralization, The Sharpless and Jasper 
data would seem to indicate the pos- 
sibility that two different neurologi- 
cal mechanisms are involved in the 
generalization of arousal at the non- 
specific level, with intensity gen- 
eralization dependent upon the ac- 
tivity of the brain stem reticular for- 
mation and quality discrimination a 
function of the thalamic reticular 
system. If this distinction is valid, 
then analysis of the functioning of 
these two systems may clarify the 
conditions under which generaliza- 


tion reflects a differentiated response 
to dimensional similarities, As dis- 
tinct from the occasions when it 
merely represents a failure of dis- 
crimination. If finer discriminations 
are indeed related to thalamic mech- 
anisms, then the prepotence of brain 
stem reticular arousal over thalamic 
activity under conditions of high ac- 
tivation would suggest a possible 
neural mechanism underlying the 
failures of discrimination which occur 
under conditions of high drive. 

The studies cited thus far have 
been concerned with modification of 
activity within the reticular system, 
either directly or as indexed by the 
arousal response. There is also evi- 
dence of changes of function within 
the specific pathways as a result of 
reticular control. Galambos, Sheatz, 
and Vernier (1956) presented con- 
tinuous auditory clicks to cats over 
extended periods of time while re- 
cording from the auditory and visual 
cortex, cochlear nucleus, hippocam- 
pus, septal area, and amygdala. After 
the animals had been subjected to 
the stimuli for hours or days, evoked 
potentials at all loci either disap- 
peared or else were small and irregu- 
lar in nature. Concomitantly, there 
was a lack of consistent behavior to- 
ward the stimuli. After 10 to 20 
strong shocks had been paired with 
the clicks, recordings from the coch- 
lear nucleus, as well as from the 
other sites, showed augmentation of 
the evoked potential. Behavioral 
changes, such as crouching, alertness, 
snarling, etc., appeared concomit- 
antly. During the extinction process, 
both behavioral and electrophysio- 
logical responses disappeared, with 
motor responses extinguishing prior 
to the evoked potentials. Thus, be- 
havioral conditioning and extinction 
would appear to be accompanied 
by consistent electrophysiological 
changes in the specific sensory prO- 
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jections, the reticular activating sys- 
tems, and the, rhinencephalic struc- 
tures. That these changes within the 
specific pathways were a consequence 
of reticular activity would seem a log- 
ical conclusion on the basis of the evi- 
dence that electrical stimulation of 
the midbrain reticular system led to 
depression of the auditory evoked 
potential in the cochlear nucleus. 

Both photic habituation (Hernan- 

dez-Peén, Guzmin-Flores, Alcaraz, 
& Fernadndez-Guardiola: 1956, 1957) 
and acoustic habituation (Herndn- 
dez-Peén, et al., 1957; Hernández- 
Peón & Sherrer, 1955; Hernández- 
Peón, Sherrer, & Jouvet, 1956) have 
been reported. The auditory habitu- 
ation occurred to sounds of constant 
intensity, repeated thousands of 
times. Habituation continued when 
the animal was asleep and was selec- 
tive to the particular frequency of 
sound used. Once habituation had 
been established, recovery in the 
magnitude of the evoked potential 
was found to occur under the follow- 
ing conditions: (a) after a period of 
rest following discontinuation of the 
habituation stimulus; (b) after sudden 
and intense acoustic stimuli; (c) after 
pairing with electrical shock; (d) 
after lesions of the midbrain reticular 
formation; and (e) under pentobarbi- 
tal anesthesia, which depresses the 
activity of the reticular system. The 
release of habituation under anes- 
thesia and following brain stem le- 
sions indicated that the reticular sys- 
tem was critically involved in habitu- 
ation, but whether it functioned 
autonomously or merely as a way sta- 
tion for cortical control remained un- 
determined. 

In summary, then, the studies re- 
viewed in this section confirm the 
critical role of the reticular structures 
in the learning process, but fail to 
clarify the extent of their functional 


autonomy. Unfortunately, analysis 
of the relative contributions of the 
specific sensory structures and the 
reticular system to adaptation and 
generalization involves many experi- 
mental difficulties. The resolution 
of these problems will go far toward 
clarifying the ature of processes 
which are fundamental to learning. 


CONCLUSION 


Studies of the reticular formation 
indicate that its structural complex- 
ity and functional plasticity override 
the conceptual limitations inherent 
in more static, reflex-like neural 
mechanisms. These characteristics 
permit it to exert facilitatory and 
suppressive effects which have a 
time-course of seconds and even min- 
utes on the activity of central nerv- 
ous system structures. This span is 
comparable to that of many behav- 
ioral events. 

The neurophysiological distinction 
between “specific” and “nonspecific” 
systems is particularly relevant to 
psychological theory. Constructs 
such as attention, perception, moti- 
vation, drive, reward, and punish- 
ment possess a common factor of non- 
specific reticular activation in addi- 
tion to their specific properties. This 
general factor of nonspecific activity 
has effects which are lawfully related 
to the timing and intensity of its ap- 
plication. It is essential, therefore, 
that psychological constructs be criti- 
cally evaluated in an attempt to de- 
termine the extent to which they are 
a function of “nonspecific” as well as 
“specific” factors. Conceptual reas- 
sessment may well indicate that cate- 
gories now regarded as independent 
and mutually exclusive in terms of 
operational criteria are functionally 
interrelated on the basis of a common 
factor of reticular activation. 
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Whenever an experiment involves 
collecting data from more than two 
groups or under more than two condi- 
tions we become involved in the prob- 
lem of multiple comparisons—the 
problem of comparing each group 
with every other group or arranging 
the results in rank order. This be- 
comes a problem when we wish to 
assign a level of confidence or signifi- 
cance to our conclusions about the 
relationships among all of the popu- 
lations involved. Classical methods 
such as the F test permit us only to 
reject the over-all null hypothesis 
that all of the means are equal but 
they do not provide a procedure for 
comparing specific means with each 
other. 

In the older psychological litera- 
ture, this problem has been dealt 
with in a haphazard manner, without 
recognizing the issues involved, More 
recently, statistical procedures spe- 
cifically designed for multiple com- 
parisons have become available and 
have been discussed briefly in the 
psychological journals (McHugh & 
Ellis, 1955; Stanley, 1957). It has 
not been clear to many psycholo- 
gists, however, that there are several 
different methods with different basic 
assumptions or approaches. There 
are important questions of logic in- 
volved in the use of these methods 
and these issues have not been clearly 
faced in the psychological literature. 
This is partly because many of the 
papers by statisticians on this sub- 


1 The writer wishes to express his appreci- 
ation to Urie Bronfenbrenner and W. T. Fe- 
derer for their detailed comments and sug- 
gestions upon an earlier draft of this paper. 


26 


ject are in sources which are inacces- 
sible or rarely used by psychologists. 
In particular, one of the most exten- 
sive discussions of the logical prob- 
lems of multiple comparisons, that of 
J. W. Tukey, has been available only 
in a privately circulated paper.? 
Other aspects of the problem have 
not been dealt with at all, to this 
writer’s knowledge, so that it seems 
to be time for an attempt to survey 
the problem systematically. 

The emphasis here is upon ques- 
tions of logic rather than specific 
methods of computation. For the 
latter, we shall simply refer to appro- 
priate sources, after we have tried to 
make clear the implications of choos- 
ing to use a particular method or set 
of tables. 

Multiple comparisons and other 
multiple tests. Multiple comparisons 
are only one instance of the use of 
multiple statistical tests in a single 
piece of research. We shall not have 
space to deal explicitly with the other 
tests except as we need to distinguish 
them from the problem of multiple 
comparisons. One of the sources of 
confusion in the past has been the 
failure to distinguish one kind of 
multiple testing from another. 

In order to prevent this kind of 
confusion from the outset, we may 
list at least five main cases in which 
multiple statistical tests are em- 
ployed: 

1. Multiple comparisons. This 
covers all cases in which results in 
several different groups are to be 


*J. W. Tukey, The Problem of Multiple 
Comparisons, Privately circulated mono- 


graph, 1953, 
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compared. Any statistic may be in- 
volved—mean, median, frequency, 
correlation coefficient, etc. For ex- 
ample, we may wish to compare the 
correlations between intelligence and 
school grade in Schools A, B, C, D, 
E, etc. The methods which have been 
explicitly published and taken up by 
psychologists have all been con- 
cerned, however, with comparisons 
in terms of the means of groups. 
Some methods for other statistics, 
e.g., proportions, are now beginning 
to appear. 

2. Multiple tests with intercorre- 
lated variables. The most common in- 
stance of this case is the computation 
of a number of different correlation 
Coefficients with a single batch of Ss. 
If 10 tests are given there may be 45 
Intercorrelations and the researcher 
May wish to state which of these cor- 
relations are significant. 

3. Multiple variables in analysis of 
variance. A factorial design will per- 
mit the computation of several F 
ratios for the data of an experiment. 
These F ratios may not be inde- 
pendent if a common error estimate 
1s used for several of them. Whether 
independent or not, several tests are 
made in the same experiment, and the 
‘mplications of this fact need to be 
analyzed, Similar problems arise if 
other kinds of analysis are used for 
what is essentially a factorial design. 
For example, several nonparametric 
tests may be made of different rear- 
rangements of the data in a way 
which is equivalent to analysis of the 
main effects in analysis of variance. 

4. Replicated tests of a single hy- 
bothesis. In the first three cases men- 
tioned above the statistical tests are 
Concerned with different hypotheses. 

or example, the different F tests in 
a factorial design are concerned with 
i ifferent variables. This fourth head- 
ng is concerned with cases where the 


same experiment is repeated with dif- 
ferent groups of Ss and repeatedly 
tested for statistical significance. 

5. Overlapping measures relating to 
a single hypothesis. Several different 
ways of measuring the same underly- 
ing variable may be available—e.g., 
different measures of rate of learning 
—and a significance test is applied to 
each of the measures separately. 

The main purpose of the list is to 
emphasize that we are concerned 
only with the first of these headings. 
Space will not permit us to analyze 
the other cases, which must be left to 
later discussions. 


GENERAL ISSUES IN MULTIPLE 
CoMPARISONS 


A priori vs. a posteriori compari- 
sons. It has been assumed that no 
special modifications of classical 
methods are needed where the com- 
parisons to be made are specified in 
advance of the collection of data (a 
priori). Most of the recent literature 
on multiple comparisons has concen- 
trated upon methods for making 
comparisons suggested by the data 
(a posteriori, also called post-mortem 
comparisons). For example, suppose 
that five conditions of learning are 
being compared. In advance, the ex- 
perimenter predicts from his learning 
theory that Condition A will lead to 
most rapid learning, Condition B 
will be second, and so on. Fisher 
(1947), and others following him, 
have recommended that the experi- 
menter perform an over-all F test 
first, then, if this is significant, he 
may perform ordinary ¢ tests be- 
tween A and B, B and C, etc. It is 
pointed out, however, that this 
method would be incorrect if the 
comparisons to be tested had not 
been selected in advance (Fisher, 
1947; McHugh & Ellis, 1955; Stan- 
ley, 1957). The new methods have 
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been designed for comparisons sug- 
gested by an inspection of the data. 

We shall contend that the differ- 
ences between the a priori and the a 
posteriori situation are slight, or even 
nonexistent, when everything is taken 
into account. This is to say that the 
newer methods are needed for all 
multiple comparisons, and that the 
classical methods are inappropriate 
even in a priori comparisons, except 
for very special circumstances. 

The issue here is similar to that in- 
volved in the debate over “one- 
tailed” vs. “two-tailed” tests of sig- 
nificance for comparing two groups 
(Burke, 1953; Hick, 1952; Jones, 
1952; Marks, 1951). The one-tailed 
test is appropriate only if the direc- 
tion of difference is predicted in ad- 
vance, and if the experimenter is 
willing to overlook any difference in 
the opposite direction, no matter 
how large. Only two conclusions are 

possible from the data when a one- 
tailed test is used—either there is a 
difference in the predicted direction, 
or the results of the experiment are 
inconclusive; in effect, the experi- 
ment cannot obtain results which are 
considered as a significant refutation 
of the prediction, If the experi- 
menter allows for the possibility of a 
result that contradicts his hypothe- 
sis, he must use a two-tailed test, and 
there is no difference in method of 
analysis from that used in an empiri- 
cal experiment where no predictions 
are made in advance. 

In the case of more than two 
means, the number of possible con- 
clusions is increased. We may have 
not only confirmation or contradic- 
tion of the prediction, but we may 
also have varying degrees of partial 
agreement with the prediction. Since 
it is usually not specified in advance 
what will be considered as a partial 

confirmation of the prediction, the 


situation is reduced essentially to the 
a posteriori case. Only if the experi- 
menter states in advance all possible 
conclusions and the rules by which 
these conclusions will be drawn, 
would he have an a priori test. 

Because of the multiplicity of con- 
clusions which might be drawn, it 
would appear most feasible to con- 
sider the statistical analysis as inde- 
pendent of any predictions of the ex- 
perimenter. In other words, we con- 
sider the statistical analysis as a 
method of making statements about 
the state of affairs as revealed by the 
data. If it turns out that the state of 
affairs is in complete or partial agree- 
ment with the prior prediction, the 
experiment makes the theory more 
plausible. If the results are wholly 
or partially in opposition to the pre- 
diction, then the theory needs to be 
revised, 

At this point the position must be 
stated very dogmatically, After some 
of the other problems have been 
dealt with, and a more complete 
terminology has been developed, we 
shall be able to give these conclusions 
further support. 

The concept of the error rate. The 
notions of significance level or confi- 
dence level have been useful ideas so 
long as we were dealing with a single 
difference between one pair of means, 
a single F ratio, a single chi-square 
value, and so on. The use of these 
terms becomes confused, however, 
when we are making simultaneous 
statements about a number of differ- 
ent comparisons of means, several 
different F ratios in a single experi- 
ment, or the like. The confusion is 
due to the fact that the concept of 
significance level may be extended in 
several different directions when we 
are considering multiple comparisons 
or multiple tests. We owe much to 
J. W. Tukey, who has clarified this 
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point, and who has developed the 
concept of error rate for multiple 
comparisons (see Footnote 2). 
There are several different kinds of 
error rate involved in the multiple 
comparison problem (and in other 
situations involving multiple tests). 
Some methods of making multiple 
significance tests fix one of the error 
rates at a suitably low level, but may 
allow the other error rates to become 
absurdly large. The problem be- 
comes that of deciding which error 
rates should be kept under control, or 
what compromises may be effected. 
Three of the main kinds of error 
rate are: 
_ 1. Error rate per comparison. This 
is the probability that any particular 
One of the comparisons will be incor- 
rectly considered to be significant. 
In general this approach is discour- 
aged by statisticians for reasons ex- 
plained below. 
_ 2. Error rate per experiment} This 
is the long-run average number of er- 
roneous statements per experiment. 
In statistical jargon it is the expected 
number of errors per experiment. Un- 
like the first error rate, which is a 
Probability, the error rate per experi- 


li “1 ukey’s terminology is based upon “‘fami- 
tes”! of comparisons rather than upon the ex- 
periment, In the one-dimensional case, these 
are equivalent terms. That is, the comparison 
of each mean with each other in the experi- 
ment is a “family” of comparisons. If we are 
Concerned with two-variable analysis, how- 
ria the experiment may be broken down 

> two families of comparisons, one for each 
Variable. We could therefore specify an error 
=" per family and a rate familywise as well 
a fan experiment and experimentwise. Our 
ane me will be based primarily upon the 
the jemeuatonl problem, and it seemed that 
the aoe would be clearer if we emphasized 
are se periment asa unit. Even where there 
argue 5 families of comparisons, we shall 
of anal aat the experiment should Be the basis 
sion T sis of the error rates. Another discus- 
in H experiment-based error rates is found 

+ O. Hartley (1955). 


ment could be greater than one. That 
is, we could set a criterion of ‘‘sig- 
nificance” in such a way that we 
would average three false statements 
for each experiment. 

3. Error rate experimentwise. This 
is the probability that one or more 
erroneous conclusions will be drawn 
in a given experiment. In other 
words, experiments are divided into 
two classes: (a) those in which all 
conclusions are correct, and (b) those 
in which some conclusions are incor- 
rect. The error rate experimentwise 
is the probability that a given experi- 
ment belongs in class (0). 

It may help to understand the dis- 
tinctions among these error rates if 
we think of a long series of experi- 
ments carried out in a given field, 
all with the same experimental de- 
sign. In each experiment a certain 
number of statements of significance 
is made—e.g., “Method A is signifi- 
cantly better than Method Bs 
“Method C is significantly poorer 
than Method B.’’ To be concrete, 
suppose that there were 1000 experi- 
ments, each with 10 statements of 
significance, 10,000 statements in all. 
Of these statements, 90 are actually 
false, and these false statements are 
distributed among 70 of the experi- 
ments. The different error rates are 
then as follows: 


1. Error rate per comparison: 
90/10,000 or .009 

2. Error rate per experiment: 
90/1000 or .09 ; 

3. Error rate experimentwise: 


70/1000 or .07 


If we look only at the error rate per 
comparison we would say that the 
statements of significance were made 
at better than the “.01 level.” Yet 
the probability is greater than a 
that any given experimental repor 
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will contain one or more false claims 
of significance. 

The various error rates are all the 
same in a simple experiment with a 
single comparison, but they become 
more and more divergent as the num- 
ber of comparisons per experiment in- 
creases. Thus, if each of 10 means is 
compared with each of the others 
there are 45 comparisons in one ex- 
periment. ` If the ‘‘significance level” 
(Error Rate 1 above) of the test ap- 
plied to each comparison is .01, we 
should expect .45 erroneous conclu- 
sions per experiment. The probability 
that there will be one or more incor- 
rect conclusions in a given experi- 
ment (Sense 3) will be somewhere be- 
tween these two values, usually closer 
to .45, as will be explained below. 

Which of the three values is, then, 
the “significance level” to be at- 
tached to the conclusion from this ex- 
periment? This is a point for exten- 
sive analysis, but we shall need more 
concepts before we can do it justice, 
At this point we shall say only that 
the basis for the choice between these 
three rates is still incompletely an- 
alyzed. Statistical workers have rec- 
ognized the problem and have de- 
veloped their procedures for multiple 
comparisons primarily on the basis of 
the third rate of error—the proba- 
bility that one or more erroneous con- 
clusions will be made in a given ex- 
periment, the experimentwise error 
rate. The implications of this deci- 
sion have not, however, been exten- 
sively developed, at least to the pres- 
ent writer’s knowledge. 

Multiple null-hypotheses. The con- 
cept of error rate cannot be defined. 
completely without taking account of 
another important fact. In our dis- 
tinctions between error rates per 
comparison, per experiment, and ex- 
perimentwise, the reader may have 
inferred that the null hypothesis 


would be that all means are drawn 
from a single population—the same 
null hypothesis which is tested in anal- 
ysis of variance by means of the F 
test. We shall call this the “com- 
plete” null hypothesis. This is one 
possibility which must be consid- 
ered, but it is not by any means the 
only one. In our example of 10 
means, five might be drawn from one 
population and five from another, six 
from one and four from the other, 
two from each of five different popu- 
lations, and so on. For each of these 
different null hypotheses, there is an 
error rate per comparison, per experi- 
ment, and experimentwise, for any 
given method of testing differences. 
The question is, therefore, which of 
these null hypotheses is used to de- 
fine the error rate for our statistical 
test? 

Tukey’s answer (see Footnote 2) 
to the above question is to define the 
error rate as the maximum value it at- 
tains under all possible null-hy- 
potheses. Some of the currently pro- 
posed methods for multiple compari- 
son are based solely upon the com- 
plete null hypothesis as the standard, 
even though the error rate may be 
higher with some other null hypothe- 
sis. Tukey’s decision would seem the 
most reasonable as well as the most 
cautious approach to this aspect of 
the problem. 

s To show how the error rate may be 
higher for some partial null hypothe- 
sis than it is for the complete hy- 
pothesis, let us consider a specific 
method of testing multiple differ- 
ences based upon traditional ap- 
proaches. Ten groups are being com- 
pared, and we test first with an over- 
all F test at the -01 level. Then if the 
F test shows significance, we will 
test each difference with an ordinary 
t test at the “.01 level.” The experi- 
mentwise error rate is .01 if we con- 
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sider only the complete null hy- 
pothesis, since no further compari- 
sons will be made if the F test does 
not show significance. The F test is 
specifically designed to produce this 
error rate under the complete null 
hypothesis. 

Suppose, however, that there are 
actually five populations, with two 
groups drawn from each population, 
and suppose that these populations 
are widely separated. Then it is al- 
most certain that the F test will be 
Significant, and ¢ tests between pairs 
drawn from distinct populations will 
also be almost certainly significant, 
as they should be. We can still make 
errors, however, in comparing means 
in the pairs drawn from identical 
Populations. Since there are five such 
Comparisons, the probability that 
one or more of these will be incor- 
rectly judged to be significant is 
(1—,.99°) which is approximately .05. 
Thus the error rate experimentwise 
Is .05 instead of .01 for this particular 
null hypothesis. The more means 
there are to be compared by this 
method, the higher will the experi- 
Mmentwise error rate become, even 
though the error rate based upon the 
complete null hypothesis is fixed at 
-01 for any number of means. 

Error rates and a priori compari- 
sons. Now that we have looked at 
Some of the different ways of evaluat- 
1ng error rates, we can deal more con- 
Cisely with the problem of a priori 
Vs. a posteriori comparisons. As an 
example, consider a learning experi- 
ment in which five conditions are be- 
ing compared, and suppose that the 
experimenter has predicted in ad- 
Vance the complete order in which the 
Means should appear. He has, in ef- 
€ct, predicted significant differences 
or all possible comparisons of the 

ve means, and complete agreement 
With the theory should produce 10 


significant differences. Suppose that 
he merely computes all 10 ż ratios in 
the standard way, determining their 
significance by references to the 
standard ‘‘Student” tables, and as- 
sume that he uses the .01 levels from 
these tables. The method which this 
experimenter has used has an error 
rate of .10 per experiment, and also 
experimentwise, even though all of 
the tests were computed on the basis 
of a .01 level for the individual com- 
parisons. In other words, in 10% of 
experiments analyzed by this method, 
there will be one or more “‘signifi- 
cant’! differences, even though the 
complete null hypothesis is true. 

Compare this with the case where 
no predictions were made in advance. 
The experiment is performed to “‘see 
what happens” and, again, all possi- 
ble ¢ tests are computed. The error 
is exactly the same as it was when 
advance predictions were made, if 
we leave aside the question of ‘‘one- 
tailed” vs. “two-tailed” tests. (If 
the experimenter in the a priori case 
wishes to allow for contradictions to 
his theory which could come out to 
be “significant” he must use a two- 
tailed test, just like the experimenter 
who makes his comparisons after the 
results are in—a posteriori.) 

In other words, the essential factor 
is the number of comparisons to be 
made and the error rate to be used, 
rather than the question of a priori 
vs. a posteriori comparisons. When 
ordinary ¢ tests are applied to all 
comparisons, each of the different 
kinds of error rate is the same 
whether predictions were made in ad- 
vance or not. 

The only situation in which ad- 
vance predictions make a difference 


4 In this discussion “significant” in quotes 
refers to a difference which would be judged 
to be significant in using the classical tables 
and based on single comparisons. 
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would be that in which several 
groups are studied, but only certain 
pairs are to be singled out for signifi- 
cance tests. Suppose that in the a 
priori case, one pair is specified in ad- 
vance as the only comparison of in- 
terest, while in the a posteriori case, 
only the largest difference is to be 
studied. The probability that the 
largest of 10 comparisons will be sig- 
nificant is not the same, of course, as 
the probability that a pair chosen at 
random will be significant. The null 
hypothesis is that the pair chosen in 
advance by the theory might as well 
have been chosen at random. Thus a 
t test applied in the usual way at the 
-01 level has a probability of .01 of 
being significant, in the a priori case. 
When the largest of the 10 differ- 
ences is chosen, it has a probability 
of .10 of appearing to be significant at 
the .01 level by classical two-mean 
tests. The probability that the larg- 
est difference will be “significant” is 
the same as the probability that there 
will be one or more “significant” 
pairs among the 10 comparisons. In 
other words, the experimentwise error 
rate for all comparisons applies to 
this special case, 

The above example is helpful in 
seeing the issues involved in multiple 
comparisons, but it has little practi- 
cal application. Tukey suggests that 
it might occur when all but two of the 
groups were studied as “camouflage” 
and only the particular two are of 
interest to the experimenter, Usu- 
ally, however, an experimenter who 
is testing a theory will use five groups 
for one of two reasons: (1) all are in- 
terrelated in the theoretical predic- 
tions or (2) some of the groups are 
predicted from theory while others 
are unpredictable from the stand- 
point of theory but the experimenter 
wishes to find out how they compare 
with each other and with those which 


are predicted by the theory. We 
have already shown that the first 
case is no different from the com- 
pletely empirical exploratory study. 
The second case would be different 
only if the results were considered as 
belonging to two separate and unre- 
lated experiments—one group of 
comparisons being used to test the 
theory, the other comparisons being 
considered as part of another empiri- 
cal exploration. 

In all of these examples, we have 
assumed that the experimenter who 
is making a priori comparisons will 
consider each “significant” difference 
in the predicted direction as support- 
ing his theory, and each “significant” 
difference in the opposite direction 
as a contradiction to his theory. He 
could, of course, have specified other 
rules for interpreting the results. In 
actual practice of psychological re- 
search, however, he rarely does, and 
the usual situation is that no rules at 
all are specified in advance. The de- 
cision as to what constitutes “agree- 
ment,” “partial agreement,” and so 
on, is made only after the results are 
in and the significance tests are made. 
This is another strong reason, already 
mentioned in the preliminary discus- 
sion of this problem, for treating all 
cases of multiple comparison in the 
same manner, whether there are pre- 
dictions in advance or not. 

Nevertheless, we should investi- 
gate to see if carefully specified rules 
for interpreting the results in rela- 
tion to the theory would have any ef- 
fect upon the significance tests. Sup- 
pose, for example, that the experi- 
menter says, “If there are at least 
some significant differences in the 
predicted direction, and none in the 
opposite direction, I shall consider 
the theory as partially substantiated. 
If there are any differences which ap- 
pear to be significant reversals of pre- 
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diction I shall revise or abandon the 
theory.” The error rate must now in- 
clude errors of false acceptance of the 
theory and also errors of false rejec- 
tion. 

Here it is easy to show that there 
are circumstances in which false ac- 
ceptance of the theory is almost cer- 
tain. For example, suppose that 
Populations A and B are equal and 
substantially higher in mean than 
Populations C and D, the latter pair 
also being equal. Finally, Population 
E has a considerably lower mean than 
any other group. The psychological 
theory has predicted that all groups 
are different, with mean A the high- 
e B next, and so on down to E. In 
ae words we are assuming that 
“ actual state of affairs is in partial 
“eng with the theoretical pre- 

ictions, but that the theory is 
fee in the relation of A to B and 
tes 3 D. We suppose in addition 
oe m differences, which do exist 
o arge that significant differ- 
teul: are almost certain in those par- 
the T sOmparigens: In this situation 
is fo heory will be rejected only if A 
È und to be significantly lower than 
A i lower than D. In all other 
i s the theory will be considered as 
ae we by the experimental re- 
eaer t tests are made at the 01 
ae aes is only a .005 probability 
in 5 roups A and B will be found 
thes etificant contradiction to the 
in ray and the same value applies 
ine e C-D pair. The probability that 

e or both will be reversed is ap- 
Proximately .01. Therefore the ex- 
ered has a .99 probability of 
oiy e Support for his theory and 
ing i. .01 probability of contradict 
‘on a point the reader may ob- 

se at accepting the theory under 
Sohe: circumstances should not be 

idered as entirely erroneous. Ait- 


er all, the actual state of affairs in 
the populations is at least partially in 
agreement with the prediction from 
theory. Certainly, to accept the 
theory in this case would not be so 
bad as to accept the theory when the 
actual population values are a com- 
plete reversal of the predicted levels. 
The question then becomes: How do 
we evaluate different degrees of 
agreement between the actual state 
of affairs and the theoretical predic- 
tions? Clearly this cannot be done 
on the basis of probability, nor can it 
be built into a standard significance 
test. The seriousness of disagree- 
ment depends upon the structure of 
the theory and the nature of the 
groups being compared. For some 
theories, the fact that Populations A 
and B are equal could be a very cru- 
cial defect in the theory; in other 
cases, this might be only a minor 
point, easily rectified. If the rela- 
tive importance of all of the possible 
comparisons were stated in advance, 
with some kind of numerical weights, 
it would be possible, although very 
complicated, to compute probabilities 
for each outcome and also some kind 
of a weighted risk function. This 
would differ from experiment to ex- 
periment and would probably be of 
little practical value. s 

To summarize, it is argued that 
comparisons decided upon a priori 
from some psychological theory 
should not affect the nature of the 
significance tests employed for multi- 
ple comparisons. Our reasons may be 
recapitulated as follows: 

1. Ordinarily, no statement is made 
in advance as to what will be consid- 
ered substantial agreement, partial 
agreement, partial contradiction, or 
complete disagreement with the the- 
ory. Even if such a statement were 
made, the probabilities of each of 
these conclusions being drawn incor- 
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rectly would have to be included in 
the error rate. 

2. A theory which predicts the 
complete order of the results calls for 
just as many comparisons as the em- 
pirical experiment in which no pre- 
diction is made. Since the number of 
comparisons to be made is a crucial 
factor in the error rate, there is no 
difference in this respect between a 
priori and a posteriori comparisons. 

3. Some comparisons may be more 
important to a theory than others. It 
is not feasible, however, to take ac- 
count of this fact in devising signifi- 
cance tests or methods of setting con- 
fidence limits, since the relative 
weights would differ from experiment 
to experiment and would have to be 
specified quantitatively in advance. 
It is therefore more practical to ex- 
amine the results in a common-sense 
manner and to evaluate qualitatively 
the degree of support or contradic- 
tion offered by the data. 

Nonindependence of comparisons. 
In the textbooks, the student is some- 
times warned against a posteriori 
comparisons, because the different 
comparisons are not independent of 
each other. While lack of independ- 
ence is a factor to be taken into ac- 
count, it is not at all the main prob- 
lem. In fact, the error rates per com- 
parison and per experiment are com- 
pletely unaffected by independence 
or lack of it. The only important 
factor in these rates is the number of 
comparisons to be made. Only the 
experimentwise error rate is affected 
by independence. If all of the com- 
parisons are perfectly positively cor- 
related, all are significant or nonsig- 
nificant en bloc. Then the experi- 
mentwise rate is equal to the rate per 
comparison. In the case of complete 
independence, of negatively corre- 
lated comparisons, or even of mod- 
erate positive correlation, the experi- 


mentwise rate is nearly equal to the 
rate per experiment, when the latter 
is small. Most cases of multiple com- 
parison fall into the latter category, 
so that the dependence of the com- 
parisons has but a slight effect. 

In the multiple comparison prob- 
lem, the lack of independence in- 
volves the fact that each mean is 
compared with every other mean, 
and therefore appears in a number 
of different significance tests. In 
many cases also a single error esti- 
mate is used for all comparisons. The 
significance tests are therefore not 
independent of each other, but this 
turns out to be less important than 
was once believed. Another kind of 
dependency must also be considered. 
The samples used in determining the 
various means may also not be inde- 
pendent of each other, notably in the 
case where the same Ss are used for 
each experimental condition. Such 
dependencies are easily taken care of 
by using two-way analysis of vari- 
ance with Ss considered as a second 
variable. 

The above conclusions on the rela- 
tive unimportance of the factor of 
independence in multiple compari- 
sons do not necessarily apply to other 
cases of multiple significance tests. 
The other cases listed in the introduc- 
tion involve other kinds of depend- 


ency and must be analyzed sep- 
arately. 


THE CHOICE or ERROR RATES 


In making multiple comparisons, 
then, neither specifying the tests in 
advance nor trying to arrange for in- 
dependent tests are of much im- 
Portance, since they have little prac- 
tical effect upon any of the error 
rates. It is of much greater practical 
importance to consider which of the 
error rates is the best representation 
of the dependability or “significance” 
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of our results. We may work at the 
-01 level on a per comparison basis, 
yet the probability may- be almost 
1.00 that we have made some errone- 
ous statements of significance in a 
given experimental report. In our 
current psychological literature the 
various bases for error rates are con- 
fused and sometimes used inter- 
changeably. 

The problem we must consider is 
the implication of using a particular 
measure of error rate for clarity and 
consistency of treatment of our re- 
search results. The issue can be made 
concrete in an example: One experi- 
menter performs a series of four ex- 
periments. In the first experiment 
he compares Groups A and B, in the 
second, Groups B and C, etc. Each 
experiment is published separately 
with a ¢ test applied to the difference 
of means in each case. In each paper 
he summarizes the results obtained 
before and in the final paper he com- 
pares all five groups, still using simple 
t tests. A second experimenter, not 
SO anxious for rapid and numerous 
publications waits until all the re- 
sults are in on all five groups, and 
Performs an analysis of variance on 
all groups, considering the results as 
Significant only if the F value is be- 
yond the .01 point. Both of these 
kinds of report are quite typical of 
the psychological literature. The 
Second experimenter has used an ex- 
Perimentwise rate of .01, at least for 
the complete null hypothesis, but he 
does not yet have any method of 
making specific comparisons between 
8roups. The first experimenter has 


‘Used a .01 level per comparison, but 


serie berimentwise rate for the whole 
E = of connected comparisons may 
Sane high as .10, depending on how 
an, of the possible comparisons 
ta the five means are actually 

e. (To simplify matters, we as- 


sume that the first experimenter 
would have performed all four ex- 
periments regardless of the results. 
If he waited for the results of each 
experiment before deciding whether 
to continue the series, matters would 
be further complicated.) 

These examples should make clear 
that both the per comparison and the 
experimentwise rates are actually in 
use in typical researches now in the 
psychological literature, even when 
only classical techniques are used. 
The second example is now the more 
common approach, and even the first 
experimenter would probably be 
more likely to perform an analysis of 
variance in his last paper to sum- 
marize the over-all results. Whether 
he would be willing to retract earlier 
conclusions if the final analysis did 
not prove to be significant is, of 
course, an embarrassing question. 

The current widespread use of anal- 
ysis of variance would suggest 
adopting the experimentwise error 
rate as the standard practice. Cur- 
rent practice is not, however, suffi- 
cient justification unless it is based 
upon careful analysis. We must 
therefore examine the problem more 
fully. : 1 
Since the rate per comparison 1S 
the easiest to use and requires no new 
methods at all, we may first consider 
the main argument in its favor. It 
might be contended that it makes no 
difference whether specific compari- 
sons are made one at a time by dif- 
ferent experimenters, or in groups by 
a single experimenter. The same 
amount of data is added to the pub- 
lished literature in either case. There- 
fore if the simple ¢ test is justified 
in one case it should be justified in 
the other also. As Tukey (see Foot- 
note 2) states this argument (which 
he considers fallacious), the man who 
has studied several means at once has 
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done more work and should be en- 
titled to make more erroneous con- 
clusions. 

There is, however, one very strong 
reason why an experimenter who 
studies a number of different groups 
or conditions has less justification 
for using a per comparison rate than 
an experimenter who performs a sin- 
gle experiment with two groups. 
Even if the complete null hypothesis 
is true, and the experimenter is work- 
ing with a factor or group of factors 
completely irrelevant to the behavior 
he is studying, the more conditions 
or the more variations of experi- 
mental conditions he studies the 
more chance he has of finding some 
differences which would appear to be 
significant on a per comparison basis. 
Thus he can obtain more “signifi- 
cant” differences by working harder 
upon irrelevant variables. This is 
the reason why Tukey considers the 
point of view of allowing erroneous 
conclusions in proportion to the 
amount of work done as an untenable 
point of view. 

The notion of allowing more errors 
per experiment in proportion to the 
amount of work done in the experi- 
ment would lead to another practice 
which is contrary to present usage. 
It would mean that the significance 
level in the ordinary two-group ex- 
perimental design could be reduced 
as the number of cases is increased. 
In this situation we ordinarily main- 
tain the significance level constant, 
but we gain through increases in 
power as the number of observations 
is increased. In the case of multiple 
comparisons we do not gain in power 
in the specific comparisons as the 
number of groups increases, but there 
is a compensation in that more in- 
formation about more different rela- 
tionships is gained as the number of 
comparisons increases. 

There are even objections to the 


use of error rate per comparison in a 
certain type of “experiment” in 
which only two groups are com- 
pared. Consider the following situa- 
tion: an experimenter is convinced 
that a certain factor should produce a 
difference in learning rate. He tries 
it once and fails to get a significant 
difference. He is so sure that the ex- 
periment should have worked that 
he reconsiders his experimental tech- 
nique for possible errors. He decides 
that some actually irrelevant feature 
of the experiment is responsible, 
changes it, and tries again. Finally 
after many different revisions of the 
conditions, all actually irrelevant, he 
obtains a “significant” difference and 
publishes the result. We assume that, 
as an honest scientist, he will men- 
tion in his report that several other 
trials failed, but this will not usually 
affect his test of significance, and he 
will usually explain away the earlier, 
unsuccessful trials as due to errors in 
technique. Clearly, all of his data 
should be tested as a single experi- 
ment, otherwise obtaining a “signifi- 
cant” difference will depend only 
upon the experimenter’s stubborn- 
ness and patience, or upon the num- 
ber of his research assistants. 

Error rate vs. Type II error and 
power. Several psychologists to 
whom the above argument has been 
presented have raised objections to 
the experimentwise error rate on the 
ground that it leads to great loss of 
power. They point out that a ¢ ratio 
may have to be as high as 4 or 5 for 
20 degrees of freedom to be significant 
at the .01 level instead of 2.85 as it is 
when significance is measured in the 
classical way. If this happens, they 
Say, we are obviously going to miss 2 
lot of real differences which might 
turn out to be important. 

_ While it is perfectly true that the 
bigger the difference which is re- 
quired for significance, the less power- 
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ful is the test’ (other things being 
equal, of course), this fact is irrele- 
vant to the issue involved in the 
choice of error rates. If the experi- 
menter prefers, he can still use a ¢ 
value of 2.85 as his criterion of sig- 
nificance even if he reports his re- 
sults in terms of error rates experi- 
mentwise. The difference is that his 
results will be reported as significant 
at (say) the .50 level experimentwise 
instead of the .01 level on a per com- 
Parison basis (which is usually not 
labelled as such). 

In other words, the issue is not 
more or less powerful tests, since the 
power can be adjusted to any de- 
sired level, but simply how we are 
going to evaluate the Type I error. 
It must be admitted, however, that 
the tables which are available for 
establishing error rates on an experi- 
mentwise basis tend to limit the ex- 
perimenter to a fixed value of the 
error rate (usually .05). This situa- 
tion can be changed, however, if there 
1s good reason to increase the power 
of the tests. 

Duncan's compromise. Duncan 
(1951, 1955) has.argued that there is 
not only a loss of power in changing 
from the per comparison to the ex- 
Perimetttwise basis, but that this 
a vd power becomes progressively 
ae as the number of comparisons 

eases, Since his method has been 
ee in several recent research papers 
n psychology, we shall examine his 
assumptions in detail. Thus if one 
experiment involves 5 means while 
another experiment involves 10 
means, and both are evaluated by 
olding the experimentwise error 
xed at .05, the experiment with the 

Means is less powerful in the sense 
ae each difference must be larger 

© judged significant. 


5 See 
Power 
cedures 


Harter (1957) for evaluation of the 
of several multiple comparison pro- 


According to Duncan, this state of 
affairs should be reversed. As more 
and more conditions are studied it is 
more and more likely that some real 
differences exist, and therefore the 
statistical tests should become more 
powerful as the number of compari- 
sons increases. This would be the 
case if we used the error rate per com- 
parison as our basis of establishing 
significance, but then the probability 
of Type I error reaches unreasonably 
high levels. As his compromise, Dun- 
can proposes to base statements of 
significance upon the rate of error 
per independent comparison or per 
degree of freedom. 

The argument for Duncan's 
method would be that when two dif- 
ferent experimenters each perform a 
simple comparison of two groups we 
allow them each a certain error rate, 
because they have performed two 
independent comparisons. It is pro- 
posed that this allowance be ex- 
tended to a single experiment in- 
volving several comparisons. If 10 
means are compared there are 45 
comparisons, but only 9 can be made 
if we are to keep them independent 
of each other. Table 1 indicates the 
relationship among the rates per com- 


TABLE 1 


Error RATES PER EXPERIMENT WHEN 
Error RATES PER COMPARISON AND 
ERROR RATES PER DEGREE OF 
FREEDOM ARE CONTROLLED 


Error Rate 


Error Rate per ser Degree af 


No. of Compari 3 
parison Fixed 
Means Fixed at .01 Ee 
01 
2 .01 A 
3 .03 .02 
03 
4 06 5 
10 04 
5 x b 
45 : 
1 19 
2 1.90 ` 
0 5 49 
50 12.2. 


® Based on the complete null hypothesis. 
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parison, per degree of freedom and 
per experiment for multiple means. 

The arguments against the per 
comparison basis of testing also ap- 
ply, although not so powerfully, 
against Duncan’s compromise pro- 
cedure. The experimenter still can 
increase his chances of finding sig- 
nificant differences by multiplying 
the number of irrelevant conditions 
which he studies in a given experi- 
ment. The probability of erroneous 
conclusions does not increase as fast 
as the number of comparisons in- 
creases, but it still increases. 

While it is true that it becomes less 
and less likely that all populations 
have the same mean, as we increase 
the number of groups, it is not known 
to what extent these differences are 
relevant to the problem being studied. 
We may therefore merely be increas- 
ing our probability of detecting dif- 
ferences which are the random result 
of factors which are not under study 
in the experiment. In other words, we 
increase the risk of finding a hodge- 
podge of “significant differences” 
which cannot be given a meaningful 
interpretation. 

Duncan’s approach is also con- 
tary to common practice, where the 
F test is applied at the same proba- 
bility level, regardless of the number 
of groups under study. To be sure, 
common practice in analysis of vari- 
ance could also be wrong and could 
be revised according to Duncan’s 
point of view, but we need to have 
some stronger arguments for doing so. 

Since the degree of conservatism 
and the inversely related power of 
the test can be explicitly varied by 
choosing varying rates of error per 
experiment or experimentwise, de- 
pending upon the type of material 
being studied and the purposes for 
which conclusions are being drawn, 
it does not seem necessary to adopt a 


rigid compromise between the per 
comparison and the experiment-based 
rates. Thus Duncan’s special pro- 
cedure seems unnecessary and may 
confuse the issues for the user of sta- 
tistics. 

Rates per experiment vs. experi- 
mentwise rates. While we cannot say 
flatly that all significance tests or all 
confidence limits must be based upon 
the experiment as a unit, there are, as 
we have seen, strong reasons to make 
the experiment the normal reference 
unit at least. In any event, it should 
always be made clear in an experi- 
mental report which approach is be- 
ing used. If the rate per comparison 
is chosen, it should require special 
justification. 

Although the two experiment- 
based error rates are often numeri- 
cally almost equal, they do represent 
somewhat different points of view 
about the nature of the conclusions 
from an experiment. In one case we 
control the total number of erroneous 
statements made in each experiment 
(rate per experiment). In the other, 
we consider that any erroneous state- 
ment spoils the conclusions from that 
experiment. In other words, the ex- 
perimentwise rate is based on the as- 
sumption that it is just as bad to 
make one erroneous conclusion as it 
is to make six in the same experi- 
ment. 

If we have to make a choice be- 
tween these two approaches, it will 
depend upon rather subtle differ- 
ences in the manner in which the ex- 
perimental conclusions are to be 
used. If the total set of conclusions 
is considered as a pattern supporting 
some theoretical position in such 4 
way that any erroneous statement 
would destroy the pattern, then the 
experimentwise basis is clearly the 
one to use. If each fact can be inter- 
preted independently of the other 


MULTIPLE COMPARISONS 39 


poung of the experiment, the per 
S periment basis is more appropri- 

In practice, our interpretation of 
experimental findings probably does 
not fall clearly at either of these ex- 
tremes. One erroneous statement 
pba will not completely destroy 
the value of the findings, but, on the 
other hand, each “fact” must be in- 
eae in some relation to the 
a eas We would therefore be 
i ifficulties, if the choice between 

he two bases were crucial. 

In a great many of the common ex- 
pel designs, the per experi- 
irs basis can be worked out from 
Ee tables of standard tests al- 
er existence although they 
Sah E more extensive than those 
Ie m the textbooks. Special tables 
Sal be developed for the experi- 
vere vise error rates, but a number 

„these are already available. The 
Sn Practical advantage of the 
ie rate lies, therefore, in 
sat yor ate’ special tables are 
coe available for experimentwise 
ee there is any discrepancy be- 
este wedi computed in the two 
tere he per experiment basis is 
is Papa than the experi- 
fee ise basis. We are therefore 
he 1 using the rate per experiment 
eta doubt, or when the experi- 
p a rate cannot be calculated, 
ie si the error rate per experiment 
ex vays larger than or equal to the 

Xperimentwise rate. 
tion algebraic statement of the rela- 
E ips among the various error 
Gi may help to show why some of 
Bi previous statements about them 
e true. Let: 


Pin = probability of one erroneous 
statement in a single trial 
using a particular critical 


value in a certain test (for 
example if ¢ is considered sig- 
nificant when it exceeds 2.75 
and the degrees of freedom 
for error are 30, pın =.01). 
This is error rate per com- 
parison. 

pix = probability of exactly one er- 
roneous statement out of a 
total of k statements which 
are made. 

pox = probability of exactly two er- 
roneous statements out of k, 
ete; 

EP =error rate per experiment 

EW =error rate experimentwise 


‘Then, by definition: 


EP=expected number of errors 
per experiment 
= prt 2p t 3ps t + °° 
+kpxx 
EW = pit poet Pat + + TPR 


It can also be shown that EP=kpin 


Thus EP is always greater than 
EW, and the difference between them 
depends upon the probabilities of 
more than one erroneous statement. 
The very simple relationship between 
EP and fin shows why EP can be 
calculated with standard tables. Sup- 
pose, for example, that we are 
comparing 10 means. There are then 
(10) (9) /2=45 comparisons to be 
made. If each difference were tested 
with an ordinary ¢ test at the .01 
level EP is 45 (.01) or .45. To reduce 
EP to the .01 level, we simply reduce 
pin to .01/45 = .00022 and find the 
corresponding value of £. The ap- 
proximate value of the required ¢ can 
be obtained from Pearson and Hart- 
ley's Table 9, “Probability Integral 
of the t-Distribution” (Pearson & 
Hartley, 1954). It turns out to be 
about 4.1 in the case given in the ex- 
ample above, where there are 30 de- 
grees of freedom. Thus, changing 
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the critical value of ¢ from 2.75 to 4.1 
changes our error rate from -01 per 
comparison to .01 per experiment. 
(It is assumed that all differences are 
to be tested against a common criti- 
cal value of ¢. Later we shall show 
that it is possible to obtain a more 
sensitive significance test by using 
variable ¢ ratios depending upon the 
observed order of the means.) 

By methods which we shall not 
discuss here, the EW rate can also 
be used to find the critical value of ¢. 
In the example we are considering, 
the EW rate turns out to require a 
value of 4.07 for t. The slight differ- 
ence is partly due to the gaps in the 
table of ¢, so that the 4.1 is only ap- 
proximate. Table 2 gives some fur- 
ther examples of comparative criti- 
cal values of ¢ for experimentwise and 
per experiment rates of .01. 

Fortunately, as the above exam- 
ples show, it is usually not necessary 
to make the difficult decision be- 
tween rates per experiment and ex- 
perimentwise in terms of the logic of 
the experimental interpretation. In 
practice it becomes merely a matter 


TABLE 2 
CRITICAL VALUES OF £ FoR TESTING 
DIFFERENCES AMONG SEVERAL 
MEANS For Error RATES or .01 


Critical Values of £ 


RY 
No. of df for For 
Means error EP=.01 For 
(approxi- EW=.01 
mate): 

5 20 3.9 3.7 
30 3.7 3.6 
60 3.5 3.4 
120 3.4 3.3 
10 30 4.1 4.1 
60 4.0 3.9 
120 Bis 3.7 
20 60 not covered 4.3 
120 by tables 4.1 


® To avoid interpolation, values are taken to the 
next tenth above the critical value. 


of computational convenience. 

Significance tests vs. confidence 
ranges. Several cf the methods now 
available for multiple comparisons 
give us conclusions in the form of 
statements of significance—''The dif- 
ference between A and B is signifi- 
cant, that between B and C is not 
significant, etc.” Others make all 
comparisons in terms of confidence 
ranges of the difference—''The dif- 
ference between A and B is from 2 to 
15, the difference between B and C 
is from —3 to 10, etc.” 

When there are only two means to 
be compared, significance statements 
can be rather easily translated into 
confidence ranges, and vice versa. In 
the case of multiple comparisons, 
however, the relationship is more 
complex. For example, the confi- 
dence range for the difference be- 
tween B and C above is from —3 to 
10, yet a method of testing for signifi- 
cance of differences, with the same 
error rate, might label the difference 
between B and C as significant. 

This discrepancy is because the 
most sensitive or powerful signifi- 
cance tests apply a different criterion 
of significance to different pairs of 
means, depending on how far apart 
they are in the total group. Thus 
the two extreme means must be far- 
ther apart for significance than two 
which are next to each other. The 
methods of determining confidence 
ranges have developed a single ‘‘al- 
lowance” which is applied to each of 
the differences regardless of where 
the means are in the total group. 

Tukey has argued for the almost 
universal use of confidence ranges in- 
stead of significance statements, bas- 
ing his case primarily upon the point 
that confidence ranges contain more 
information and information which 
is more useful to future researchers 
than a statement of significance. 
Whether or not he is correct in this 
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contention, the fact remains that 
most of our familiar statistical tools 
(e.g., the F and chi-square tests) are 
significance tests. As a result most 
psychological researchers are more 
accustomed to thinking in terms of 
significance. It will therefore require 
a long period of readjustment if 
Tukey’s point of view is to prevail. 

Because it is more in keeping with 
current practice and ways of think- 
ing in psychology, most of this paper 
Is couched in terms of statements of 
Significance. It is important to re- 
alize, however, that there is more dif- 
ference between the two approaches 
when we are involved in multiple 
comparisons than there is in the 
simple case of two means. 

Comparisons and contrasts. All of 
the discussion so far has been directed 
at the comparison of one mean with 
another mean in the group. Some- 
ied however, other problems arise. 
We might, for example, wish to di- 
vide the means into two groups of 
means, by inspection of the data, and 
to state whether the two groups of 
means differ significantly from each 
other. In the literature of this field, 
the term contrast is used for the com- 
Parison of any combination of means 
Asis another combination. Contrasts 

clude cases where the means are 
combined with differential weights 
for different groups. 

Some of the procedures now avail- 
able make it possible to test for sig- 
nificance or place confidence limits 
on all possible contrasts among the 
Means of a given experiment. Meth- 
i which are effective for such 
li -purposes are not so effective, 

vever, for the case of simple com- 
Parisons of one mean with another. 


Sprciric Mrruops FOR MULTI- 
PLE COMPARISON OF MEANS 


We shall describe very briefly 
Some procedures which have been 


proposed for solving the problem 
of multiple comparisons. Tukey’s 
methods will be stressed especially 
because of convenience, because of 
their control of experimentwise error 
rates for all null hypotheses, and be- 
cause of their special suitability for 
simple comparisons of means. 
Several of these procedures involve 
something which we may call a “lay- 
er method.” By this we mean that 
the observed means (or other meas- 
ures) are first ranked from low to 
high, A first test is applied to the dif- 
ference of the extremes. If this dif- 
ference is not considered significant 
by the rules of the method, no fur- 
ther tests are made. If the extremes 
are significantly different an extreme 
value is tested against the value next 
to the other extreme, and so on. The 
effect is that no differences within a 
group or any subgroup of means can 
be considered significant if the ex- 
tremes of the subgroup are not sig- 
nificantly different. The size of 
difference required for significance 
changes with the separation of the 
means in the rank-ordered array. 
The layer method would be con- 
trasted with a method by which a 
fixed interval is required for signifi- 
cance of any difference, no matter 
where the means fall in the rank or- 
der of the results. It would also be 
contrasted with a “gap” method in 
which adjacent means are tested at 
once. If a gap is significant then all 
means on one side of the gap are sig- 
nificantly different from those on the 
other side. Tukey’s earlier method 
(1949) was based upon gaps, but he 
has since abandoned the procedure 
as unsatisfactory- 
The method of Newman and Keuls. 
This procedure, first suggested by 
Newman (1939) and refined by 
Keuls (1952), controls the error rate 
experimentwise only for the complete 
null hypothesis. It is based upon 
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“layers” and uses the distribution of 
the range of ‘‘studentized’”’ means as 
a test. The level of the test is kept at 
the same nominal value throughout. 
As a result, the error rate can rise to 
ma/2 (m being the number of means, 
a being the nominal significance 
level), when the means are equal in 
pairs. Tukey’s method discussed be- 
low is a modification which corrects 
this difficulty. 

Duncan's procedures. Duncan 
(1955) has presented tables for two 
different layer procedures. One is 
based upon an F test for each sub- 
group, and the other based upon the 
range. In both cases, however, the 
error rate is set at œ per degree of 
freedom, so that (m—1)qa is the error 
rate experimentwise. We have agreed 
with Tukey, however, in considering 
this compromise basis of controlling 
error as still open to objection. Those 
who prefer this approach will find 
Duncan’s tables useful. 

Bechhofer's method. Bechhofer 
(1954; Bechhofer, Dunnett, & Sobel, 
1954; Bechhofer & Sobel, 1953) has 
considered a problem related to that 
of multiple comparison, but one 
which is formulated in a quite dif- 
ferent way. Instead of testing the 
null hypothesis that all means are 
equal, he assumes a situation in 
which we already know that there 
must be differences among the means. 
The problem is not one of testing 
significance, nor of setting confidence 
limits, but of finding the relative or- 
der of the means with a predeter- 
mined probability of being correct, or 
of choosing only the highest of the 
means, the first and second highest, 
etc. Since Bechhofer is dealing with 
such a different problem from the one 
we have been considering, we shall 
not attempt to describe the method 
here. Those who are concerned with 
this kind of problem in psychological 

work are referred to Bechhofer’s 


papers. 


Tukey's methods. As we have 
stated above, Tukey (see Footnote 
2) has made the most detailed and 
thorough analysis of the problem of 
multiple comparisons. It is unfor- 
tunate that his paper is not generally 
available and has not yet been pub- 
lished. If it had been, the present ac- 
count could be much shorter. It will 
be remembered that Tukey favors 
an approach based upon confidence 
limits, rather than statements of sig- 
nificance, but he has presented meth- 
ods for both in his paper. Using the 
procedures he has developed or 
adapted we can: (a) set simultaneous 
confidence limits for all differences 
among means in a one-way analysis 
of variance design at the level a ex- 
perimentwise; (b) set simultaneous 
confidence limits for all comparisons 
among groups of means and all linear 
functions of the means (e.g., 2Mi 
+M2—3Ms) (“‘contrast” and “uni- 
versal” allowances); (c) set simul- 
taneous confidence limits for all com- 
parisons or contrasts in a two-way 
analysis using a variablewise (‘‘fam- 
ilywise”) error rate. That is, the 
error rate is œ for each dimension of 
the analysis; (d) set confidence limits 
for interactions; (e) make simul- 
taneous significance tests on any of 
these. 

In addition, he has developed three 
different methods of calculation 
based (a) on variances and ¢ ratios of 
the ordinary sort, (b) using short cuts 
based upon ranges instead of calcu- 
lating standard errors, and (c) an in- 
termediate method which he calls 
the “half-cut” procedure. The tables 
for the short-cut method at the 5% 
level for both one-way and two-way 
analysis of variance, have been made 
available to psychological researchers 
by Mosteller and Bush (1954, pP- 
304-307). Tukey has also analyzed 
the effects of non-normality of the 
populations upon these procedures, 
concluding that the short cut pro- 
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cedures are most sensitive to the ef- 
fects of non-normality and the longer 
procedures are safest if there are 
doubts about the normality of the 
populations. Unfortunately, Mostel- 
ler ind Bush’s brief presentation does 
not make clear that the short cut is 
sensitive to non-normality, nor do 
they make explicit the issue of error 
rates. The reader may not realize 
that, in two-way analysis, the error 
rate is 5% familywise or 10% experi- 
mentwise. 

In an appendix to this paper, we 
pall outline the specific computa- 
eel applying Tukey’s method to 
as p € comparisons of means. Here 

F shall only indicate the general 
peple involved. We already have 
ooul the probability distribution 
XN cone of m means, based upon 
ee imated standard error of the 
= 1 (the tables of the ‘‘studentized” 
The [Pearson & Hartley, 1954]). 
Ge upper 5% point of this distribu- 
ee bad the range which is exceeded 
nee 7 5% of samples of m means 
ech: he complet null hypothesis. 
the ne 95% of the time mone of 
the ifferences among the means in 
ei will be greater than the 
fea given at the “5% point.” We 
oe ope _take as our confidence 
i ena 5% value for all of the 
is aoe in the group. The value 
ar he to and subtracted from each 
Tuke e observed differences, and 
9507 y shows that the probability 1S 
Pe all of these confidence 
values Ms sees RS pope ries 
e ether the true differences 
i ro or any other value, 1€., 

hae the complete null hypothesis 
ete or not. By applying suitable 

a TS to these same “allowances 

See be used for setting confi- 
ion rp tte for any linear combina- 
80 on, means, for interactions, and 
ae setting confidence limits, the 

e€ allowance is applied to all dif- 


Bible te a as nkey finde no 
deal with S PAE BS Wis 
closer to ethene is nae are clore ana 
AET ES n in the observed series, 

ing the experimentwise 
ARRIN For example, the range of 
AE E e four adjacent means is 

3 y the mean value of two 5% 

points; (a) that for the range of four 
isolated means, and (b) that for the 
whole m means of the total group. It 
is a compromise between the allow- 
ances used by Newman and Keuls 
and those used for the confidence 
limits. 

Scheffé's contrast allowances. This 
method (Scheffé, 1953) is similar to 
Tukey’s in its application, but it is 
based upon the F distribution rather 
than the range. For any given value 
of F there is a maximum difference 
which can occur between any pair of 
means. If we find this difference for 
the value of F at (say) the 5% point 
of F, then no difference can excee 
this critical value unless F is in the 
extreme 5% area of its distribution. 
Therefore this difference will be ex- 
ceeded not more than 5% of the 
time. Similarly, there is a maximum 
value which can be attained by any 
particular contrast among the means 
(e.g. 3Mı+M:+Ma—5M), includ- 
ing means of groups of means, 
weighted sums of groups of means 10 
any possible combination, and so on. 
These values are used to set confi- 
dence limits for each contrast, with a 
specified experimentwise error rate. 

The allowances for simple differ- 
ences obtained by the Schefté method 
are larger than those obtained from 
Tukey’s method based upon ranges. 
For the ordinary multiple compati- 


is 

son problem, 

owerful or more sen- 

sts involving Sev- 

cially where we wis 
divide ans into 

to div! i te 


two groups an 
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of the two groups, Scheffé’s method 
is the more sensitive. Where we are 
interested solely or primarily in sim- 
ple differences between pairs of 
means, we would not choose Scheffé’s 
method. (Note that we must choose 
between the methods for any par- 
ticular experiment, we cannot use 
both.) Tukey’s analysis of the prob- 
lem includes a strong case for using 
the simple comparisons as the pri- 
mary basis for evaluating a method. 
There are serious logical difficulties 
involved in comparing groups of 
means or other contrasts, difficulties 
which would require too much space 
to specify here. 

Some other procedures. One special 
case of multiple comparisons occurs 
when one group is a control and we 
wish to compare all other groups 
with this one control. Dunnett 
(1955) has treated this case and has 
presented tables for controlling the 
experimentwise error rate, 

Another problem occurring fre- 
quently is the comparison of frequen- 
cies of occurrence or proportions in 
multiple classes. Relatively little has 
been done on this problem, except 
for the special case of choosing the 
class with the highest proportion. 
The latter case is discussed by Kozel- 
ka (1956). The general multiple 
comparison problem for proportions, 
where we wish to compare the pro- 
portion in each class with each other 
class, can be solved readily if we wish 
to work with an error rate per experi- 
ment. We merely apply ordinary two- 
sample procedures to each pair using 
a probability level of a divided by 
the total number of comparisons 
(m(m—1)/2). Then the error rate 
per experiment will be æa and the ex- 
perimentwise error rate cannot be 

larger than a. How much smaller the 
error rate experimentwise would be 
still remains to be determined. 


Other cases of multiple tests. As 


noted at the beginning there are 
other situations in which a number of 
statistical tests are made upon one 
set of experimental results. While 
we have had space to discuss at 
length only the problem of multiple 
comparisons, we must emphasize that 
the same fundamental issues are in- 
volved in the other cases as well. 

In all of the cases there is the ques- 
tion of basing the error rate upon the 
individual comparison (as is fre- 
quently done in the literature) or to 
consider the error rate in relation to 
the experiment as the unit. Conclu- 
sions upon these other cases will not 
necessarily be the same as for multi- 
ple comparisons, since the purpose of 
the statistical analysis is different in 
each situation. Each of the cases re- 
quires an analysis similar to the one 
which we have made for the case of 
multiple comparisons, and, as yet, 
little has been done on most of them. 

As an example of the problems in- 
volved, we shall consider briefly just 
one of the other cases—that of mul- 
tiple F tests in a factorial experiment 
(Case 2 on p. 27). Hartley (1955) 
has described a method for control- 
ling the experimentwise rate of error 
in a multivariable analysis of vari- 
ance. By this method it is possible to 
test each source of variance in such a 
way that there is a specified proba- 
bility that there will be one or more 
incorrect conclusions in the total ex- 
periment. Hartley does not go into 
detail, however, as to the problem of 
deciding when the experimentwise 
rate should be used. 

The present writer believes that 
the same arguments which support 
the experiment-based error rates for 
multiple comparisons would also ap- 
ply to multiple F tests. In the mul- 
tiple comparison situation the exper- 
imenter can increase the probability 
of finding some (erroneously) signifi- 
cant results by studying more and 
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more levels of the variable in the ex- 
periment and by basing his signifi- 
cance tests on the single comparison 
rate. In the same way, one can in- 
crease the probability of finding some 
significant F ratios in an experiment 
by complicating the experiment with 
more and more irrelevant variables, 
while continuing to base the error 
rate upon the individual F. For ex- 
ample, in a factorial design with five 
pial: there could be as many as 

1 F ratios. If each were tested at the 
seman “05 level” the probability 

at some of them would turn out to 
be significant is almost .80, under the 
null hypothesis. 


SUMMARY 


_ We have considered several basic 
eet pa in multiple compari- 
. Our present position on these 
problems is as follows: 
aie: general the same procedures 
oF aie be used, whether the direction 
hee has been predicted in 
nets ce or not. The same procedures 
apply to comparisons sug- 
ee by the data, should be ap- 
i when the comparisons have 
en specified in advance. 
eeu, general, the experiment 
y e used as the unit in com- 
na ng error rates, rather than the in- 
vidual comparison or test. 
a Following Tukey's lead, the er- 
b rate should be determined on the 

asis of that null hypothesis which 
Maximizes the rate. 

4. The error rate per experiment is 
an upper limit for the error rate ex- 
berimentwise, and therefore provides 
ie a test which can be used 
nee b the experimentwise rate can- 

e computed. 

5. The choice between the two ex- 
Deriment-based error rates is usually 

Be of convenience, since they differ 

ut little numerically in the cases 


where both procedures are available. 

6. The relative advantages of con- 
fidence limits vs. significance tests 
have not been treated in this discus- 
sion, but it is pointed out that the 
two methods do not lead to parallel 
conclusions in the case of multiple 
comparisons. 

7. Several of the available methods 
for multiple comparison are reviewed 
briefly. 


APPENDIX 


Tuxery’s METHOD ror MUL- 
TIPLE COMPARISONS? 


We present here a brief set of instructions 
for Tukey's method for comparing individual 
means, making use of the tables of the ‘‘stu- 
dentized range.” ‘These tables, as published 
in Pearson and Hartley (1954, pp. 176-177) 
permit the comparison of up to 20 means in a 
group at either the 5% or 1% level experi- 
mentwise. Tukey has developed a table 
covering larger groups at the 5% level (see 
Footnote 2) but it has not yet been published. 


The following symbols are used throughout: 

s The standard error of any of the individual 
means. In ordinary analysis of variance 
this is the square root of “mean square 
for error” divided by /a, where a is the 
number of cases upon which each mean is 
based. We assume that all groups are of 
equal size. 

y Degrees of fre 


of s 

SR Percentage point of the studentized 
range as read from the table. 

WSD (Tukey's abbreviation for “wholly 
significant difference”) the final allowance 
used in establishing confidence limits or 
determining significance of individual 
comparisons WSD=SR:s 

n Number of means in the group 


compared.” 


edom in the determination 


being 


6 See Footnote 2. i 
7 Unfortunately, symbols appropriate to the 


tables in Pearson and Hartley may be confus- 
ing becaue they fail to correspond, to com- 
mon practice in the psychological literature. 
The reason is that the tables are set up in 
terms of the range of individual values rather 
than a range of group means, and we must 
make appropriate translations of terms. 
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A. CONFIDENCE LIMITS FOR 
ALL COMPARISONS 


1. Determine s 

2. Determine SR by reading the table of the 
studentized range for the appropriate confi- 
dence level (5% or 1%), degrees of freedom 
(v), and n. Note that SR is the upper 5% or 
1% point. 

3. Find WSD by multiplying SR by s. 

4, Adding and subtracting WSD for any 
given difference between a pair of means will 
give the appropriate confidence limits for the 
difference of that particular pair, with the 
error rate controlled experimentwise. 


B. SIGNIFICANCE TESTS 
Definitions 


For any given pair of means, let k be the 
number of means in the subgroup including 
the two means, i.e., two plus the number of 
means between them. For example, in the 
series: 8, 19, 22, 23, 27, 28 when we test the 
difference between 19 (Mz) and 27 (M;), k 
is 4. 

In the following steps “testing a given pair 
of means” will mean the following: (a) Find 
SR corresponding to x, the total number of 
means being compared. (b) Find the value of 
SR corresponding to k for that group (i.e. 
reading the table for k in place of n). (c) Find 
the mean of these two values of SR. (d) Find 
the mean WSD by multiplying s by the mean 
SR. (e) The difference between the pair of 
means is considered significant if it is greater 
than this mean WSD, but there are also re- 
strictions on the order of testing to be de- 
scribed below. 


Procedure: 


1, Determine s (sce definition), 


2. Arrange the means in order of magni- 
tude. 

3. Test the difference between the extreme 
values, using the WSD for the total number 
of cases. (For the extreme values (k=n.) If 
the extreme means are not significantly differ- 
ent, no further tests are made, and we con- 
clude that there are no significant differences 
in the group. 


If the extremes are significantly different: 

4. Test each extreme mean against the 
mean next to the other end of the array, using 
the mean WSD for k=n—1. If neither of these 
tests is significant, we stop with the conclu- 
sion that only the extremes of the group differ 
significantly. 


If either or both of the tests are significant: 

5. Test all subgroups with k=n—2. Con- 
tinue until all subgroups of a given sizeare 
found to be nonsignificant. 


Note that in testing the differences by 
layers as described in the procedure above, a 
difference cannot be significant unless the 
particular pair of means is also surrounded 
by another pair which have been found to 
differ significantly. To begin with, none of the 
differences are significant unless the extreme 
values differ significantly, In general, any 
particular significant pair must belong to a 
larger group of which the extremes are sig- 
nificantly different. We may illustrate with a 
pair which is adjacent and near the middle of 
the range, say Mz and M; (i.e., 22 and 23 in 
the example above). Before Ma can differ 
significantly from M,, either M; must be found 
to be significantly different from Ms, or else 
M, must differ significantly from My, Each 
of these pairs depends in turn upon the sig- 
nificance of means which have two means be- 
tween them, and so on. 
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Emphasis on the complexity of be- 
havior has led to the development of 
standardized test batteries which 
provide several scores at the same 
time—i.e., “profiles.” Although such 
instruments may have many advant- 
ages over those which yield only a 
single score, they also introduce a 
number of statistical problems which 
complicate the analysis of the data 
they provide. 

These statistical problems arise in 
part simply because each individual 
contributes a set of scores which are 
not statistically independent meas- 
ures. Our present difficulty in han- 
dling sets of scores from individuals is 
evidenced by the variety of methods 
which have been Proposed recently 
for appraising the “pattern,” “level,” 
and “scatter” of profiles of individ- 
uals or groups of individuals, and by 
the lack of consensus among such 
methods. Data of this type can, of 
course, be analyzed effectively by 
multivariate techniques. However, 
since multivariate analysis is beyond 
the scope of many research workers 
not trained in advanced statistics, we 
are still faced with the need for use- 
ful methods for working with data 
which are essentially multivariate. 

The purpose of this study is to ex- 
amine and compare several methods 
for dealing with two questions which 
usually arise when one works with 
profiles—namely, the formation of 


1 This investigation was supported by re- 
search grant M-637 from the National Insti- 
tute of Mental Health, Public Health Service, 
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groups and some criterion for decid- 
ing when a “group” exists. By and 
large, we will restrict our discussion 
to cases where the investigator has 
some rationale for grouping the pro- 
files and wishes to estimate the homo- 
geneity of the group and the proper 
membership of the individuals in it. 

The data for this study are 12 Min- 
nesota Multiphasic Personality In- 
ventory profiles (nine clinical scales) 
from individuals who were tested ina 
clinic setting. These data were se- 
lected because, psychometrically, the 
MMPI is similar to many other tests 
in so far as it claims to have the fol- 
lowing properties: (a) The scales 
measure several basic aspects of be- 
havior, e.g., attitudes, values, or per- 
sonality dimensions, (b) each of the 
scales is standardized on rather large 
referent populations, and (e) al- 
though each scale is somewhat in- 
dependent, no one scale can be in- 
terpreted properly apart from the 
others. 

These MMPI data were selected 
also because independent clinical di- 
agnoses were available for each of the 
12 individuals, who fell into the fol- 
lowing clinical groups: three “hy- 
pertensives,” four “neurotics,’” an 
five “psychotics.” In the discussion 
to follow, we will utilize these diag- 
noses. The standardized MMP! 


scores for these Ss are given in Table 
i 


* We are grateful to Robert E. Harris, 
Langley Porter Clinic, San Francisco, for pf 
viding us with the M MPI profiles and clinic? 
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Group III 
“Psychotic” Ss 


8 
73.89 


| (CP) 
65.14 


7 
58.78 


TABLE 1 


MMPI DATA ON J 
ON 12 Supjects CLAS 
enti (yt ne eee es 
ED AS “HYPERTENSIVE,” “NEUROTIC,” oR “PsycHoric’’* 


Group II 
“Neurotic” Ss 


mean =50, sigma =10. 
nd D. i 


4 
CP =empirical criterion profile. See discussion of Table 2, Sets C a: 


|_(cpn? 


3 


Group I 
“Hypertensive” Ss 


These MMPI scales are standardized so that for each scale, 


Scales 
a 
b 


Mean: 
Sigma: 


PROCEDURE AND FINDINGS 

Our procedure and findings will be 
considered under the following four 
questions: 1. What “natural group- 
ings,” if any, exist among the pro- 
files? 2. What aspects of the profiles 
should be considered in forming 
groups? 3. What types of criterion 
profiles may be used in forming 
groups? and 4. How can one deter- 
mine the group-membership of indi- 
vidual profiles? 

We used two rather distinct ap- 
proaches to the question of how 
groups of profiles might be formed: 
the factor analytic and direct corre- 
lation methods. Some of the proper- 
ties and findings of these methods 
will be discussed in turn. 


1. What ‘Natural Groupings,” if 
Any, Exist Among the Profiles? 


Although the factor analytic meth- 
ods were developed to isolate the 
number of “factors” present in a bat- 
tery of tests or test items, these 
methods have been adapted to select 
groups of persons or profiles. This is 
accomplished by intercorrelating per- 
sons over a set of test scores having 
the same scale properties and then 
factoring the obtained correlation 
matrix. Instead of “factors” in the 
traditional sense, one OF more clus- 
ters of persons emerge who are more 
highly correlated among themselves 
than they are with the remaining per- 
sons included in the analysis. 

Within limits, the investigator can 


S. Incidentally, we are not interested 
substantive meaning that ne 
might have, and no attempt was mace 
et profiles “at random” from the three 
clinical “populations. Rather, for illustra- 
tive purposes, profiles were selected which 
were relatively similar within each classifica- 
tion. Other sets of real and artificial data 
tudied by the proced- 

and their ap- 


have, of course, been s 
pon any par- 


diagnose 
here in any 


ures discussed in this paper, 
plicability is not dependent u 


ticular set of data. 
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choose among the several factor an- 
alytic methods now available to form 
groups of profiles. His choice will de- 
pend, among other things, upon: his 
personal inclination to rely on the 
methods themselves or upon his own 
insights and judgments, the degree of 
methodological elegance desired and 
its status value to him, the amount of 
information he has regarding possible 
groupings, and various “practical’’ 
considerations such as how much 
time, money, machinery, and assist- 
ance he has to get the job done. 

In addition to the orthogonal cen- 
troid method (Thurstone, 1941), we 
have selected the oblimax rotational 
solution (Pinzka & Saunders, 1954) 
and the multiple group method 
(Holzinger & Harman, 1941; Thur- 
stone, 1941) for illustrative pur- 
poses. The latter two methods differ 
greatly in that the oblimax method 
leaves the investigator free from hav- 
ing to make any subjective decisions 
after the data have been fed into an 
electronic computer, whereas the 
multiple group method requires the 
investigator to form tentative group- 
ings before beginning the analysis. 

The results obtained by these fac- 
tor analytic methods are as follows:3 

The orthogonal centroid method.— 
Since the “factors” or profile groups 
obtained by this method are uncorre- 
lated, this method is useful when the 
profile groups actually are independ- 
ent, when not more than one clear- 
cut group exists, or when the inves- 
tigator is interested in the one group 
which is most representative of the 

` total set of profiles. In the data in 
Table 1, however, the profiles com- 
prising Groups I and II are somewhat 


8 The communalities were estimated by 20 
iterative approximations according to the 
method developed by Dickman for the 
ILLIAC, with the result that the mean abso- 
lute residual for the 12 communalities was only 


.002. 
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alike in pattern, but they differ 
clearly from those in Group III. 
(These relationships are also appar- 
ent from the ‘‘correlations among the 
factors” in Table 2.) In cases where 
more than one nonorthogonal group 
exists, the centroid method by itself 
provides information to select the 
profiles with the highest loadings on 
the first “factor” only. Since we 
have reason to expect that more than 
one group exists among the 12 pro- 
files in Table 1, and that the groups 
are nonorthogonal, some method for 
obtaining an oblique solution seems 
appropriate. The following method 
was used to provide an objectively 
obtained oblique solution. 

The centroid method with an oblimax 
solution.—This method enables the 
investigator to rely on the objective- 
ly obtained solution to provide the 
groupings which most closely ap- 
proximate simple structure. Al- 
though the oblimax method requires 
an electronic computer, it has certain 
obvious advantages, especially when 
the investigator has no idea at all of 
how the profiles should be grouped. 
The formation of three groups among 
the 12 profiles is indicated by the fac- 
tor loadings and the correlations 
among the factors, which are given as 
Set A of Table 2. 

The multiple group method.—The 
three clinical diagnoses were used to 
form the tentative groupings for the 
multiple group method in this study: 
This method is a good deal less com- 
plicated and time consuming than 
the centroid method although, if 
tentative profile groupings can be 
formed which are reasonably well 
constituted, the results of these tw° 
methods do not differ appreciably-’ 


‘In the case of faulty initial grouping, re- 
allocation of profiles to their proper group 
generally can be made from the results of the 
multiple group solution. See discussion O 
Question 4 below. 
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TABLE 2 


© £ N 4 y 
OMPARISON OF FINDINGS FROM FACTOR ANALYTIC AND CORRELATION METHODS FOR 
GROUPING 12 PROFILES INTO 3 GROUPS 


Set A Set B BEA Set D 

: J orrelation of Fro- ; 

Centroid Solution, Multiple Group files: Taterclass r Coe A, of Pro- 

Oblimax Rotation Method (r =R when (menai apse R 
) , Sigmas 7#) 


means =, sigmas = 


Part I: Fac í 
t I: Factor Loadings and Correlations 


Ši Groups Groups Groups Groups 

—— t | All Oba ee | “Tit a Tir.) |) ae 
Tou — ———— | LCOS ———_—S|| | — ee aaa 
= 1 | .44| 12 |-.07 | .90) 48 J+ 93 | 04S [maz2 | 289] MORN e 

2 "ar | an 36) 201 Meese ASS 70 | .90| .15 |—-70 "e5 | .13 |--74 

——__ 3 | 77 | <18: 07 | £200 | 1:68 |2:5%einee 66 |—.58 | 195 | -40 |—-68 
TOL 88 [TS aloe 
ip Il 4 | .07| .94| .02| -56| 1.00 |—.21 s7 | .98:| 219 R sell oens 

5 |—.12 | 1.05 .05 .27 95 07 28 96 08 |—.08 |. -76 14 

6 |—.15 79 |\—.10 45 79 \—.34 44 82 |—.31 .39 59 |—.43 

ax 7 -13 | .80 04 .52 8 |—.19 .52 93 |—.18 .52 79 |—.32 
ro Pd ie hl a ee rr nc. 
up III 8 on) as | ede |e 08 ee ee Pn NE ee ae 

9 |—.2!1 .08 -24 |—.83 |—.22 99 |—.81 |—-17 "98 |—.88 |—-32 NES 

10 leti pects | nee [eee a 96 |—182,|—-33 |. -96 | T:39 2996 heey. 

30 |—.62 os | .96 |—.60 | -09 "07 |—.71 |--14 | -73 

> |=, 91 |—. , -01 |—.66 |—-38 35 


= II III II II II III 
rou O elie 
era alza | [ate nd ee eee 


y for the two methods. 


That is + 
at is to say, the relation between fer appreciabl 
annot be expected that 


fe semulte of these two methods can However, it c E eae ; 
tion p by the use of a transforma- these two methods will An ie a 
EA AA (Holzinger & Harman, numerical results, since t a pe 
amon: which gives the relationships loadings for the oblimax met 9 ar 
ee the coordinates of the two in terms of projections o O ! ae 
a mat Leg from which can be obtained axes in oblique space, W E sia 
ERP containing the cosines of the multiple group—or the ro a ee 
tion x (often interpreted as a func- troid- 10S dipai an in termi oe 
two s seek correlation) between the jections on oblique axes 1 
Bite. of factors. From the trans- nal (i.e., coérdinate) eae. gine 

a ion matrix, which is given in Direct correlation method. BP 
a ae we can assume that the approach to this problem O 

oid factors could have been ro- 
TABLE 3 


ate 
S . : 
é Ra O as to give results essentially AOM 
multi w as those obtained by the RELATION OF FACTORS ORNS eek OF 
.'ple group method, which are CENTROID FACTORING TO 


Siven į 
A 11n Set B of Table 2. > ooe 


c . ; 

mult; Omparison of the oblimax and JI 

althoug} group methods reveals that, 735: 

itali gh any decision based on the I 996 ei ERABI 
à 1.003 


itin pue in Table 2 would re- H 

Profiles e same choice as to which We 

three belong together to form the 3 Compare also with the correlation ame 
groups, the factor loadings dif- tors given in Table 2, Set B. 
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mining profile groupings can be illus- 
trated with these same data. The 
rationale for this approach is based 
on the following assumptions: (a) if 
the mean of a distribution is the 
measure which best represents all of 
the scores in that distribution, then 
for a given set of k profiles, the mean 
of the k scores for a given subtest 
should be the measure which best 
represents all of the k subtest scores; 
and (b) generalizing over the c sub- 
tests, the mean score for each sub- 
test should form the profile which 
best represents all of the k profiles 
(individuals) in the set. For example, 
in Table 1, Group I, the three indi- 
vidual profile scores for the Hs sub- 
test are averaged to obtain the mean 
Hs score of 71.67. When this pro- 
cedure is followed for each of the nine 
subtests, the profile thus formed will 
be called an empirical criterion pro- 
file. 

As with the multiple group meth- 
od, let us assume (at least tentative- 
ly) that the three clinical diagnoses 
provide a reasonable basis for group- 
ing the 12 profiles, And, if we as- 
sume that three meaningful groups 
exist among the 12 profiles, we can 
readily obtain three empirical cri- 
terion profiles, one for each group 
(cf. italicized values in Table 1). 

Given the obtained criterion pro- 
files, we can then compute product- 
moment correlations (rs) between 
each of the 12 individual profiles and 
each of the three criterion profiles, 
For example, in Table 1, the r be- 
tween CP; and the profile for Sis 
-93 and the r between CPy and the 
profile for S 1 is .45, and so on. Fol- 
lowing this procedure we obtain the 
findings given under Set C of Table 
2.5 It will be seen that these 7s are 


5 It should be noted that the applicability 
of this procedure is not restricted to test bat- 
teries such as the MMPI. If the investigator 
has collected a set of measures on a reasonably 


rather similar to the factor loadings 
given in Set B, which are called the 
“factor pattern.” Furthermore, if we 
intercorrelate the three criterion pro- 
files, we find that the relations among 
these profiles, as indicated by the rs, 
is similar to the relations among the 
factors given in Set B, which are 
called the “factor structure.” i 
From an inspection of the values in 
Sets B and C, it is apparent that the 
findings from these two methods 
agree closely. Of the 36 compari- 
sons between the factor loadings and 
the rs, we find the largest discrepan- 
cy to be approximately .05, with a 
mean absolute difference of approxi- 
mately .02. (We have already seen 
from Table 3 that a rotation of the 
centroid factors by Thurstone’s tech- 
nique would give results which are 
essentially the same as those in Set 
B.) Although the differences between 
the findings for Sets B and C are un- 
systematic and small, we are not pre- 
pared to argue that either set is ‘‘cor- 
rect” and js approximated by the 
other. Rather, it seems sufficient at 
this time to say only that these meth- 
ods give results which are prac- 
tically identical, 
The direct method of correlating 
the individual profiles with the em- 
Pirical criterion profiles avoids a 
number of disadvantages inherent in 
the factor analytic techniques. Dif- 
ferences in the relative ease of com- 


large number of individuals, he can readily 
transform these raw data into standardize 

scores so that the measures (i.e., scale scores. 
have equal means and equal sigmas before he 
forms groups of profiles. Furthermore, this 
Procedure can be used to answer a variety 0 
questions: for example, if the investigato" 
wished merely to select the one profile whic? 
is Most representative of a particular group» 
he would determine which individual profile 
Correlates highest with the criterion profi 5 
Thus, in Set C of Table 2, Profiles 3, 4, and 


; $ e 
are most representative of their respectiv" 
groups. 
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putation are obvious, since several 
time-consuming operations are side- 
stepped by the direct correlation 
method. Specifically, this method 
does not require the calculation of an 
initial correlation matrix, the esti- 
mation of communalities (cf. Wrig- 
ley, 1957), the factorization of the 
matrix, and the probable need to ro- 
tate the initial solution to obtain 
simple structure. (As with the mul- 
tiple group method, when the initial 
groupings are well constituted, the 
results will approximate simple struc- 
ture.) On the other hand, the use of 
a criterion profile has an important 
advantage—it is intuitively mean- 
ingful since it shows which scales 
tend to have high or low scores for 
the group in question, and hence may 
be thought of as the “definition of a 
factor.” Such scalar values are not 
available when factor analytic tech- 
niques are used. 


2. What Aspects of the Profiles Should 
be Considered in Forming Groups? 


In computing the correlations giv- 
en in Set C of Table 2, the product- 
moment or interclass coefficient (7) 
was used. This was done to facilitate 
the comparison of Sets B and C in 
Table 2 since r, or some approxima- 
tion of it, characteristically has been 
used in factor analytic techniques. 
This is not to say, however, that r is 
the most appropriate measure to use 
when one is interested in forming 
8roups of profiles based on standard- 
ized test batteries, The bivariate 
Statistic z always equates the two 
os riates being correlated by reduc- 
onl them to deviation scores, $0 that 
the a measure of the similarity Of 
ie standard scores (some- 
How, called pattern) is reflected in 7. 
emai it is inevitable that any in- 
ime regarding differences 10 

© means (sometimes called level) 
Profile sigmas (sometimes called 


scatter), like “poor Clementine,” is 
lost and gone forever when r is used. 

In forming groups of profiles, the 
coefficient of intraclass correlation (R) 
can be used effectively to reflect any 
meaningful differences that might 
exist among the profile means and/or 
sigmas (Haggard, 1958). That is to 
say, the statistic R enables the in- 
vestigator to consider or to ignore 
these differences in terms of the prop- 
erties of the particular test battery 
and his research questions when 
working with profiles from standard- 
ized tests. More specifically, if he 
wishes to equalize (i.e., disregard dif- 
ferences in) the profile means, he can 
add the appropriate constant to the 
scores in each profile and/or if he 
wishes to equalize the profile sigmas, 
he can divide the scores in each pro- 
file by its standard deviation before 
computing R.° With these possibili- 
ties for adjusting the profile scores, 
one can obtain four meaningful sets 
of intraclass correlations between the 
individual and the criterion profiles. 
The four possibilities, which can be 
compared with the findings reported 
in Sets B and C of Table 2, are as fol- 
lows: (a) the means and the sigmas 
of the profiles are equalized (this 
method permits grouping profiles in 
terms of their pattern only), (b) the 
profile means are equalized but the 
sigmas are allowed to vary (permits 
grouping profiles in terms of their 


6 In this type of problem we can assume a 
one-way analysis of variance design with ¢ 
classes (subtests) and * replications (profiles). 
The R can then be computed from tho ire 

i zari table by 
squares of the analysis of variance 
using the formula: R= (BCMS—WMS) 


/(BCMS+1k MWS), where BCMS is the 


betwee classes mean square and WMS is the 


hin classes mean square. Jn 
i , criterion 


computing R for t! 

le correlati nae 
eed be computed for k of any sizes 
the profiles in the three groups @ ee) 
5) and for all three groups togethe: 


are given 1n Table 4. 


4, and 


=3, 4, 
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pattern and scatter), (c) the means 
are allowed to vary but the sigmas 
are equalized (permits grouping pro- 
files in terms of their pattern and 
level), (d4) the means and the sigmas 
are allowed to vary (permits group- 
ing profiles in terms of their pattern, 
level, and scatter). 

Although it would be possible to 
compare all the Rs between the indi- 
vidual and criterion profiles for each 
of these four methods, it will be suffi- 
cient for illustrative purposes to 
compare only methods (a) and (d). 

When profile means and sigmas 
are equalized (a above).—Under these 
conditions, R is reduced to r, so that 
R=r (Haggard, 1958). Consequent- 
ly, the findings in Set C of Table 1 
can be thought of as intraclass corre- 
lations computed on the profiles with 
equalized means and sigmas. 

When profile means and Sigmas are 
allowed to differ (d above).—The pos- 
sible effect of equating the profile 
means and sigmas in forming groups 
can be seen by comparing the find- 
ings reported in Sets C and D of Ta 
ble 2. In each instance, the italicized 
values are reduced in Set D, indicat- 
ing that the measure of correspon- 
dence of the profiles in each of the 
three groups is decreased when dif- 
ferences in the profile means and sig- 
mas are taken into account. This al- 
ways occurs when the profile means 
and/or sigmas differ, It is apparent 
also that this decrease varies from 
profile to profile. In some instances, 
such as with S 8 in Group III, the 
drop in R is only from .862 to .857, 
but S 12 in this same group shows a 
drop from .910 to .354, (The reason 
for the difference in the size of these 
Rs can be seen from an inspection of 
the means and sigmas of the indi- 
vidual and criterion profiles given in 
Table 1.) In connection with these 
findings, if an investigator decided to 


drop the least similar profile from 
Group III, we would expect him to 
drop either No. 8 or No. 12, depend- 
ing on whether he relied on the find- 
ings in Set C or those in Set D. 

From the above results it seems 
apparent that, if one assumes dif- 
ference in profile means and sigmas 
to be important aspects of the pro- 
files to be groupd, all of the correla- 
tions in Set C are too high and some 
of them may be quite misleading. It 
also follows that, under the above as- 
sumption, the familiar factor analytic 
techniques which utilize + will yield 
results which suffer the same short- 
comings when used to group indi- 
viduals on the basis of profiles from 
standardized test batteries. The co- 
efficient R is a more general and flex- 
ible measure of correlation with this 
type of data and should be used 
when the investigator is interested in 
Possible differences in the profile 
means and/or sigmas, 

An additional question that may 
arise in grouping profiles has to do 
with the degree of homogeneity of a 
group of profiles taken together. In- 
traclass correlation can be used as a 
general descriptive measure to indi- 
cate group homogeneity since it can 
be computed readily with k of any 


size (see Footnote 6). To illustrate 
the three profiles in 


this possibility, 
Group I, the four profiles in Group 
II, and the five profiles in Group III 
were correlated both under condi- 
tions where the profile means and 
sigmas were equalized and where 
both were allowed to vary. The re- 
sults are given in Table 4. One can, 
of course, use Rs computed on a set 
of & profiles as a general criterion or 
distance measure, such as by requir- 
ing that R reach some specified value 
(e.g., .70 or 85), before consider- 


ing a collection of profiles to be a 
“group.” 
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TABLE 4 
INTRACLASS CORRELATIONS FOR GROUPS OF PROFILES 


i.e., R=7)b 


Group I Group II Group III Totale 
(k=3) (k=4) (k=5) (k=12) 
R (profile means and sigmas are equalized, 
«809 -804 -843 .124 
R (profile means and sigmas not equalized, 
-733 .526 523 057 


ie, Rx?) 


" The Rs 24 57 indi 
's of .124 and .057 indicate the over-all agreement among all 12 profiles—i.e., the degree to which a general 


actor exists in the se a 
g When 52, d ton set of data. 
computing R (cf, Haggard, 1958). 


3. What Types of Criterion Profiles 
May Be Used in Forming Groups? 
eee using either factor analytic 
F niques or empiricalcriterion pro- 
les, one’s findings are necessarily re- 
stricted to the members of the par- 
ticular sample studied, and the best 
possible results of such methods 
would provide one or more groups of 
profiles which are most like each 
other in that sample. The findings of 
such methods are, of course, influ- 
ak by various sampling artifacts, 
d ce the groups are formed only on 
he basis of the profiles which are in- 

cluded in the sample. 
an various research situations an in- 
En igator may not wish to be depend- 
omaan the profiles in a particular 
Hae e in order to form groups of in- 
a i uals. For example, he may wish 
A torm groups in terms of one or 
age priori “ideal” profiles which 
ic ased on theoretical considera- 
i ns (cf., e.g. Abel et al., 1956). Or, 
n order to repeat a previous study, 
ae wish to form groups of indi- 
ies with profiles as similar as pos- 
at to those of groups studied previ- 
usly by himself or by others. With 
is 9 procedures discussed thus far, it 
sae Possible to form groups around 

a priori profiles. 
nome flexibility of the direct corre- 
n method can be extended to uti- 


the average of the &(k—1) possible rs when the profile means and sigmas are equalized before 


lize a priori criterion profiles. After 
such profiles are defined, any of the 
four methods of obtaining intraclass 
correlations discussed under Ques- 
tion 2 can be used to screen individ- 
ual profiles in order to select those 
which correlate most highly with the 
a priori criterion profile(s). The for- 
mation of a group would then depend 
upon the number of profiles and the 
degree of over-all homogeneity re- 
quired for the group or groups. 


4. How Can One Determine the Group- 
Membership of Individual Profiles? 


In view of the findings given in 
Table 2, the grouping based on clin- 
ical diagnoses adequately partitioned 
the 12 profiles in the illustrative data 
which we have cited. But it is not to 
be expected that the groupings will 
always “come out right” the first 
time. If they do not, the investigator 
is faced with the problem of reassign- 
ing improperly classified profiles to 
their proper group. 

With either the multiple group Or 
direct correlation methods, informa- 
tion of the type given in Table 2 gen- 
erally will indicate the appropriate 
group membership of an individual 
profile if the tentative grouping 1s 
reasonably correct. That is to say, 
these two methods can be used ef- 
fectively when the investigator is not 
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completely ignorant of how the pro- 
files should be grouped, but wishes to 
determine the appropriateness of his 
tentative grouping and to correct for 
any misgrouping. 

This case can be illustrated by the 
following example: Let us assume 
that profile No. 6 had been placed 
(i.e., misplaced) in Group I. For il- 
lustrative purposes this was done 
with the result that the factor load- 
ings and correlations for profile No. 
6 clearly deviate from those of Nos. 
1, 2, and 3 in Group I, and conform 
more nearly to those of Nos. 4, 5, and 
7 in Group II. Such findings indicate 
that No. 6 should be assigned to 
Group II. An additional effect of 
profile misgrouping is to inflate the 
correlations among the factors or cri- 
terion profiles. As the profiles in 
the different groups become max- 
imally homogeneous (i.e., optimally 
grouped), these correlations will ap- 
proach their minimum value; if the 
profiles were assigned at random to 
the groups, these correlations would 
approach +1. 

Questions regarding group mem- 
bership also arise when an investiga- 
tor has one or more new profiles and 
wishes to assign them to the appro- 
priate existing group. The most ef- 
ficient method for doing this is to cor- 
relate the individual profile(s) in 
question with the existing empirical 
or a priori criterion profile(s).7. The 
assignment of an individual profile to 
a group will depend in part upon 
which of the possible correlations and 
which type of criterion profile are 
used, as indicated in the discussion of 
profiles Nos. 8 and 12 under Question 
2 above. It is to be expected that the 
same criteria which are used to de- 
fine a group in the first place will 


7 With the factor analytic methods, the con- 
sideration of new profiles would require an en- 
larged correlation matrix and refactoring, 


apply when new profiles are added 
to it. 


SomME CONCLUDING REMARKS 


In this paper we have touched only 
lightly upon, or have bypassed com- 
pletely, two rather important issues. 
They deserve further comment. 

The first issue has to do with the 
fact that the product-moment co- 
efficient 7, and more elaborate tech- 
niques based upon it, ignore the sca- 
lar values of the variables which are 
correlated. In the bivariate case, 
where two different scales of meas- 
urement are involved, it is not pos- 
sible to compare the relative position 
of the paired scores in their respec- 
tive distributions and at the same 
time consider the scalar values of the 
paired scores. In such cases r is an 
appropriate measure of correlation, 
but since v always converts the origi- 
nal variables into standard score 
units, it eliminates any informa- 
tion regarding the original scalar 
values, 

Frequently, however, test bat- 
teries which yield a set of scores from 
which profiles can be formed are 
standardized so that the two oF 
more variables have a common scale 
of measurement, i.e., a common 
mean and a common sigma. Under 
these conditions, one may wish tO 
take account of their scalar values: 
For example, many clinicians assert 
that information regarding profile 
level and scatter must be taken into 
account in evaluating the meaning © 
one or more profiles. In such cases, !¢ 
is clear that the intraclass coefficient 
R, a univariate statistic, can be usec 
Furthermore, it can be shown thats 
in many instances, meaningful grouP® 
of profiles can be formed only whe" 
their scalar values are taken into aC 
count. This point will be discussed !" 
more detail in another paper. 

The second issue has to do with th® 
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possible statistical significance of the 
coefficient of intraclass correlation. 
In this paper we have considered R 
only as a distance or descriptive 
measure, although the statistical sig- 
nificance and confidence limits of 
any R can be estimated under the 
proper conditions. But since each of 
the 12 individuals contributed nine 
measures in the profile data which 
we have used, it cannot be assumed 
that these nine measures and their 
standard errors are independent in the 
sense of being uncorrelated. In other 
words, we must assume that these 
profile data are essentially multivari- 
ate, and, consequently, a univariate 
test of significance (e.g., F) cannot be 


applied to these data in their present 
form. 

It is possible, however, to convert 
profile data such as we have used by 
dividing each profile score by its 
standard error of estimate to obtain 
“stabilized scores.” It can be shown 
that stabilized scores have statistical 
properties which enable the investi- 
gator to analyze various aspects of a 
group of profiles, such as level and 
scatter differences or the degree of 
over-all homogeneity, and to deter- 
mine the statistical significance of his 
findings. The procedures for carry- 
ing out such pattern analytic studies 
have been presented elsewhere (Hag- 


gard, 1958). 
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A REVIEW OF SENSORY PRECONDITIONING! 


ROBERT J. SEIDEL 
University of Pennsylvania? 


The paradigm for SPC was estab- 
lished by Brogden (1939), as was 
the name ‘‘Sensory Preconditioning.” 
The procedure consists of the follow- 
ing three stages: (a) repeated con- 
tiguous unreinforced presentation of 
intersensory stimuli, (b) establishing 
a response to one of them, and (c) 
testing transfer of response to the 
other stimulus. Unfortunately, a 
control which is necessitated by this 
procedure has not been utilized in a 
number of experiments (Bahrick, 
1952; Brogden, 1939; Karn, 1947), 
That is, equal exposure to the test 
stimulus must be given to both ex- 
perimental and control groups. Lack- 
ing this control, the eventual differ- 
ence between the group initially pre- 
sented with paired stimuli and the 
control group could be attributed to 
differential familiarity with the test 
stimulus. Consequently, according to 
Reid (1952), these early studies by 
themselves are not conclusive. 

The present study is a review of 
the existing data in this area with an 
attempt at reconciliation of certain 
of the more apparent inconsistencies, 
A descriptive analysis of the experi- 
mental setting will be offered; and, 
later, research will be suggested to 
clarify the existing body of informa- 
tion concerning sensory precondi- 
tioning (SPC). Within this ap- 
proach the paper is directed toward 
a general consideration of three ques- 


> 1The author gratefully acknowledges the 
criticisms and analyses offered by William A. 
Shaw, Ronald H. Forgus, and Howard 
Ranken during the preparation of this manu- 
script. 

? Now at Research Directorate, Air Force 
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tions: (a) Is sensory preconditioning, 
as substantiated by the existing data, 
a phenomenon to be dealt with by 
learning theory? (6) If so, what laws 
of learning does it follow? (c) What 
are some of the problems which the 
learning theorist faces in attempting 
to integrate the data of SPC into his 
system? 


THE EXPERIMENTAL EVIDENCE 


Animal studies. The initial experi- 
ment on “sensory preconditioning” 
was done by Brogden (1939). Eight 
experimental animals were presented 
with 200 pairings of a bell and a light. 
Secondly, one of these stimuli was 
used as a CS in a shock-avoidance 
setting until a criterion of avoidance 
was reached. During the test trials 
the other stimulus was presented and 
responses to extinction were re- 
corded. The control animals which 
had not been exposed to the precon- 
ditioning pairing gave significantly 
fewer Rs to the unreinforced stimu- 
lus. 

Subsequently, Reid (1952) per- 
formedan experiment with 16 pigeons. 
He trained them in a modified Skin- 
ner box to peck for food reward at & 
signal. In the test situation, pecking 
Rs were counted to the other stimu- 
lus which was presented without re- 
ward. The design was modified 8° 
that the control and experimenta 
groups were given equal amounts 9 
exposure to the buzzer and light dur- 
ing pretraining; however, for thé 
Control Ss the stimuli were not 
paired but presented separately: 
With the Ss equated in this manner 
no significant differences were O 
tained between experimental a? 
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el, groups in number of pecking 
or ligt D. oo stimulus (either buzzer 
ene Reid does summarize re- 
aoe at unpublished study, using 
Phersen n hich was done by Mac- 
ae in which Brogden’s original 
idy on confirmed; however, this 
posure : not equate amount of ex- 
as S o the test stimulus for the 
B j oups. z 
an a (1952, 1953), using rats in 
e lance „Situation, obtained 
did ig at ambiguous ee He 
Bee eet hice dae food 
led i ton) during preconditioning 
fects Acasa positive transfer ef- 
but ee .. did low drive (satiated) ; 
the a ising outcome was that 
showed 303. SEOUR (under high D) 
a sree ae the transfer to as great 
Mental as the Low Drive experi- 
Possible group (Bahrick, 1953). A 
rence oe for this occur- 
same aç y lie in Bahrick’s use of the 
trainin ge for exposure and 
ficient - erhaps there was a suf- 
tus i ee of cues in the appara- 
the fettiate han the buzzer to mediate 
poeefraisler effect to the light. This 
eid’s an may plausibly account for 
Seve s a, also. 
obtained + positive results have been 
training „recently in an avoidance 
eyer aE by Silver and 
ever, the 54). Unlike Bahrick, how- 
us whi 7 used an exposure appara- 
rom Eher was distinctly different 
Phases E used in the other two 
ors „the experiment. These au- 
Hime” rats, found no signifi- 
Sroups erences among three contro 
trainin, one of which had had pre- 
(ight g to the test stimulus alone 
imulus Neh one to the training 
e with one (buzzer or light), an 
Pparent uo pretraining experience. 
the te ntly, differential exposure to 
These f stimulus was unimportant. 
in ndings were subsidiary to the 
Purpose which was to relate 


sensory preconditioning to classical 
conditioning by showing that the 
same optimal temporal relationships 
hold for connection of the intersen- 
sory stimuli to occur as for the CS- 
UCS in the Pavlovian paradigm. 
The three experimental situations in 
preconditioning were: simultaneous, 
forward (.5 second between simuli), 
and backward (.5 second between 
stimuli, but the training and test 
conditions were reversed). The re- 
sults partially support their hypothe- 
sis of similarity of the two proce- 
dures since “forward” sensory pre- 
conditioning resulted in greater posi- 
tive transfer effect in the test avoid- 
ance training than either of the other 
experimental conditions. The latter 
two did not differ from one another in 
transfer effect. The fact that the ex- 
perimental groups as a whole gave 
more avoidance Rs in the test situa- 
tion than did the controls indicates 
that “backward” preconditioning ex- 
ists. Yet does such a temporal rela- 
tionship between CS-UCS exist for 
classical conditioning? Without con- 
sidering this PC phenomenon neces- 
sarily analogous to what has been 
called “backward” conditioning, at 
this point one might raise the ques- 
tion of a possible difference between 
the two in temporal parameters. 
(Coppock’s S-R analysis [1958] to be 
covered later presents a forward con- 
ditioning interpretation of such an 
occurrence in SPC.) 

Finally, a recent study by the au- 


thor (1958) extended Bahrick’s find- 
ings regarding the role of specific re- 
sponses as ossible mediators in 
SPC. Hooded rats were exposed to 
the PC stimuli when food-deprived, 
and later were split into hungry, 
thirsty, and satiated groups during 
avoidance learning (and transfer). 
All three experimental groups showed 
the same degree of positive transfer 
when compared to the control group 
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(initially exposed only to the test 
stimulus). The reader will note that 
the experimental animals showed this 
equivalent effect despite differences 
in degree of similarity between the 
autonomic response-complex present 
during the preconditioning period 
and that present during the training- 
testing phases. It seems also possible 
then, that in SPC, unlike condition- 
ing, the role of the response is an un- 
important one. 

Human studies. Karn's study 
(1947) most resembles Brogden’s orig- 
inal design. This author used finger- 
flexion avoidance training. The 12 
experimental Ss (college students) 
received 50 simultaneous presenta- 
tions of buzzer and light; all 24 Ss 
were then trained to criterion to 
avoid shock by responding to the 
buzzer; and finally, all were given 10 
unreinforced trials to the light. The 
control group had no pretraining, 
The results agree with Brogden’s 
data, but suffer from the same flaw, 
unequal exposure to the test stimulus 
(favoring the Experimental Ss). 

Brogden’s 1942 study incorporat- 
ing the GSR as the CR met this lack 
and the results turned out negative. 
But the outcome was attributed by 
the author to lack of a reliable meas- 
ure of conditioning; and hence, the 
experiment was not a valid test of 
SPC. In the rest of his experiments, 
which were somewhat more success- 
ful, Brogden (1947, 1950; Brogden & 
Gregg, 1951; Chernikoff & Brogden, 
1949) also controlled for possible dif- 
ferential effects. 

In one study where Brogden (1947) 
utilized reaction time measure in- 
stead of GSR, he was successful in 
obtaining the SPC effect. The train- 
ing, transfer, and extinction test pro- 

cedures were: 30 trials to light; 10 
trials to tone; 10 extinction trials to 
light. Included were three control 
groups: (a) Given no pretraining 


(preconditioning) period. This con- 
dition provided a test for sensory 
generalization (based upon unequal 
exposure to the test stimulus). (b) 
Given exposure to the test stimulus 
alone equal to that of the experi- 
mental groups. This was the usual 
SPC control condition. (c) Given no 
pretraining and no transfer test to the 
tone. This group acted as the SPC 
control condition for the extinction 
test of reaction time to the light. All 
Ss were told to respond to the light 
only and they would be shocked if 
they were too slow. Actually, no 
shock was given. The instructions 
were given after S had been told to be 
seated and E “accidentally” had pre- 
sented the preconditioning stimuli 
while “fixing” the apparatus. 

The transfer test was successful 
in showing SPC, In this test Control 
Groups (a) and (b) did not differ 
from one another eyen though (b) had 
the advantage of sensory generaliza- 
tion. The extinction test was not 
successful. It was based upon the as- 
sumption that the unreinforced (no 
shock) tone presentations should 
have extinguished the shock-expec- 
tancy to the greatest degree for the 
experimental group, next for Control 
Groups (a) and (b), and least for 
Control Group (c). The first three 
groups showed similar marked ex- 
tinction (latency increase) to the 
light while Group (c) showed none. 
Apparently, the 10 unreinforced tone 
trials, regardless of prior associations: 
were sufficient to extinguish the ex- 
pectancy, 

Chernikoff and Brogden (1949) 
repeated the experiment using elec- 
tronic equipment, and a diffuse tone 
Source, all of which seemed to increas® 
the efficiency of the experiment since 
Positive transfer results were ob- 
tained with only 10 Ss per grouP 
whereas Brogden had used 42. Als 
the percentage of Ss responding 1” 
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the test series was twice that of the 
other study. Another variable in 
this experiment was instructional 
variation. One experimental group 
was given the same instructions as in 
the previous study, while two others 
were told ‘not to respond” or to “do 
what seems natural” in the test situa- 
tion. Only the group given the old in- 
structions evidenced a_ significant 
difference from the controls. 

In another experiment (1950) 
Brogden utilized a diffuse source for 
both tone and light in precondition- 
ing. This time he failed to get suc- 
cessful results with the usual meas- 
ures. By adding the procedure of 
measuring absolute auditory thresh- 
olds to the preconditioning tone at 
the end of the experimental sessions, 
he obtained positive results. He 
found that the presence of the light 
with the tone led to greater “lower- 
ing” of the auditory threshold for 
the experimental group than for the 
controls. The “lowering” is put in 
quotes since again the data are sub- 
ject to contamination by the S’s pos- 
sible deliberate pressing of the key 
even when he was in doubt about 
hearing the tone. This is quite pos- 
sible since: (a) he was instructed to 
respond to tone if present even when 
in doubt; and (b) he had previously 
experienced the light and tone simul- 
taneously. This again points up the 
need for an involuntary response. 
With regard to the second point, 1t 
has long been known that facilitation 
takes place when one of these stim- 
uli is supplemented by the other 
(Child & Wendt, 1938). It is possible 
that in this study the difference in the 
facilitation effect (lowering of the 
tonal threshold) between control and 
experimental groups was a result of 
the excess paired presentations of 
tone and light given the experimental 
group. 


Brogden and Gregg (1951) re- 


peated the threshold procedure in 
six experiments with variations on: 
(a) sequence of threshold trials (with 
and without light), (b) number of pre- 
conditioning pairings, (c) steps in ob- 
taining threshold, and (d) illumina- 
tion (increase or decrease). No sig- 
nificant ¢ ratios were obtained for the 
above variations, but the experi- 
mental group as a whole showed the 
same results as the earlier study. In 
recalling the proposed relationship 
between sensory preconditioning and 
classical conditioning, it should be 
noted that the strength of a CR is a 
function of the number of reinforced 
trials. Although the exact value 
of this finding cannot be determined 
since his data are confounded with a 
possible facilitation effect, Brogden’s 
data indicate that no such relation- 
ship between frequency of exposure 
and association strength exists in 
SPC. Brogden also summarized two 
unpublished studies which support 
the above positive findings. 

A recent attempt at sensory pre- 
conditioning with human subjects 
was made by Bitterman, Reed, and 
Kubala (1953), who wanted to show 
that SPC produces as stable an effect 
as does classical conditioning. Their 
rationale rests on a sensory integra- 
tion approach to preconditioning, 
and they hoped to indicate that a 
Hullian S-R interpretation could not 
predict the same results. The S-R 
argument presumed is that the need 
reduction following PCS (precondi- 
tioning stimuli) would be less than 
that following the CS; consequently, 
the sEr would be weaker following a 
given number of sensory precondi- 
tioning trials than for the same num- 
ber of conditioning trials. 

While their results show no differ- 
ence in response to extinction be- 
tween PCS and CS, their data should 
be evaluated cautiously. First, a dif- 
ficulty in interpreting their findings 
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stems from the fact that the two 
PCSs were lights differing in position 
on a panel. Technically, then, this 
procedure deviated from the usual 
one since no intersensory relation 
was attempted. Moreover, the re- 
sponse measured was the GSR, 
something which requires extreme 
care when conducting any sort of 
conditioning experiment. Seated ina 
semidarkened room, the precondi- 
tioning group was presented with the 
training and generalization stimuli, 
sometimes with the CS alone and 
sometimes paired with the PCS, so 
that the termination of one coin- 
cided with the onset of the other. 
The conditioning group, on the other 
hand, was presented with each stimu- 
lus on separate trials for the same 
total number of. pretraining trials. 
Extinction for all Ss to both the CS 
and generalization stimulus (the 
other PC stimulus for the PC group) 
followed training session. 

Following upon the use of the 
GSR and this procedure, two possi- 
ble weaknesses seem to discount 
these results. To begin with, a GSR 
is elicited by a wide variety of stim- 
uli, and was most probably present 
during the pretraining period; hence, 
the preconditioning paradigm was 
not followed. In order for a valid 
procedure to have been used, it would 
have been necessary first to extin- 
guish the GSR to the experimental 
stimuli. Secondly, since a GSR prob- 
ably did occur to light itself, it is 
quite possible that a summation ef- 
fect occurred in the preconditioning 
group. This would mean that a 
greater GSR could have been elic- 
ited to both lights in the precondi- 
tioning period than for each light 
presented separately in pretraining 
(the condition for the control group). 
Thus, contrary to the implication of 
Bitterman, et al., that the stimuli 


were initially neutral, it seems prob- 
able that they were not. Further, 
the authors asserted that stimulus 
generalization could not explain the 
results. However, if the GSR and 
summation did occur in the manner 
outlined above, then the precondi- 
tioning group should have been con- 
ditioned to a greater degree to light 
stimuli than the conditioning group- 
Consequently, it is quite possible 
that greater stimulus generalization 
could account for the fact that the so- 
called preconditioning group showed 
greater generalization in extinction 
(one measure of preconditioning) 
than did the conditioning group. Un- 
fortunately, there was no measure of 
GSR reported for the pretraining 
period for either groups so that, al- 
though highly probable, the evalua- 
tion requires additional data for sub- 
stantiation. 

The latest SPC study (Coppock, 
1958) to appear was concerned with 
“pre-extinction” and involved the 
use of GSR in classical conditioning 
(shock as UCS). Although the data 
shed some light on the meaning of 
the foregoing experiment, the results 
raise questions related to the inter- 
pretation of GSR in SPC as a medi- 
ating response for S-R theory. 

The experiment consisted of four 
experimental groups and a control 
group. The latter was exposed to the 
PC stimuli (light and tone) sepa 
rately on randomly alternated trials- 
Two of the other groups, PC a” 
IPC, were analogous to the forwat 
and “backward” PC groups in Silver 
and Meyer’s study. The interstimU 
lus interval in the present study: 
however, was 1 sec. as compared to - 
sec. in the other; and Coppock 
referred to the inverted stimulu‘ 
presentation as IPC (inverted P 
rather than “backward.” 

The two pre-extinction group? 
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To were treated like PC initially. 
one group, IPE, was immedi- 
ately given an equal number of in- 
uA exposures of the PC stimuli 
ine TPC): The other group, SPE, 
Mad presented with the first stimulus 
CS). Sage to _unreinforced 
tips , oppock’s predictions based 
( a a0 S-R mediation analysis were: 
a) IPE>SPE, (b) SPE>C, de- 
ees upon success of pre-extinc- 
ae, IPC>C, and (d) PC>C. 
haat R analysis of SPC will be dis- 

ee later in the Theory Section. 
Senne results did not completely 
The m the proposed S-R hypotheses. 
Gate Ponparamel E comparisons 
>C 7 Coppock revealed (a) PC 
a ee IPE>SPE, 
oa 1 Ee did not differ signifi- 
Manet rom C. Since the experi- 
ida effects were found to be inde- 
Baid ae of GSR reactivity, per se, 
dlr y procedural variables, it is not 
shown hy the IPC group should have 
While d training-extinction effects 
icc. SPE Ss exhibited no pre- 
certair ron: effects. Unfortunately, 
Nets. 3 additional statistical compari- 
Have te not made which would 
ough : a a basis for more thor- 
-$ me, a eo evaluation (viz., 
setting, Se in the pre-extinction 
ftom A For example, it is apparent 
that IPE graph of the transfer data 
Superio Was equal to and possibly 
al A to PC. Also, SPE seemed 
tion” ; o PC. Certainly if “extinc- 
this ta to be meaningfully applied in 
or oipe cnment, whether from >~ 
and PC viewpoints, equality of SPE 
ties, Į presents theoretical difficul- 
A T addition, unless it is assumed 
um in SPC association was maxi- 
shoul dh the PC group, S-S theory 
the IPE < predicted IPE > PC since 
Pairings group had twice, as many 
Cording of PC stimuli. Finally, ac- 
g to S-S theory IPC should 


have done almost as well as the PC 
group. (Reversal of S-S appearance 
could have weakened the expectancy 
slightly.) 

It was stated at the outset of this 
analysis that Coppock’s study (1958) 
shed some light on the ambiguities 
in the GSR experiment of Bitter- 
man, et al. (1953). As noted earlier in 
the discussion, Coppock found equiv- 
alence in GSR reactivity among all 
groups at various stages of the experi- 
ment and that the treatment effects 
were independent of GSR magni- 
tude, per se. While one cannot neces- 
sarily infer between experiments in 
this regard, such a finding does lend 
somewhat more credence to Bitter- 


man’s findings. 

As one final point, it should be 
noted that in Coppock’s experiment 
the GSR did not follow the custom- 
ary S-R curve of extinction. Here 
again, as in Seidel’s experiment 
(1958), the role of a specific response 
in SPC seemed irrelevant to the de- 
gree of association of the PC stimuli. 

Apart from the theoretical prob- 
lems, Coppock's IPC group (1 sec. 
interstimulus interval) yielded data 
contrary to those previously ob- 
tained in a comparable condition, 
(cf. p. 59). Silver and Meyer's “Back- 
ward” PC group (.5 sec. stimulus 
interval) showed mediation equal to 
their simultaneous PC group (1954). 
What basis may exist for the dis- 
crepancy may only be speculated 
upon at this point: voluntary (avoid- 
ance) VS. involuntary (GSR) re- 
sponse, rats Vs. humans, difference in 
temporal intervals (.5 vs 1 sec.)- 
Clearly, this problem needs further 
investigation. y 

One experiment (Wickens & Briggs, 
1951) used an identifying response 
during oe : 
stimuli. is one pres' 
cae to show that SPC is merely an 
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instance of “mediated stimulus gen- 
eralization’’ (MSG), and that con- 
tiguity of the PC stimuli is unneces- 
sary to obtain the desired transfer 
effect. One group of college students 
was exposed to 15 contiguous presen- 
tations of tone and light, while an- 
other group was given 15 separate 
presentations of tone and 15 of light 
in random order. During the PC pe- 
riod the Ss were asked to give a 
verbal recognition response (‘‘Now’’) 
to the stimuli. Both groups showed 
the same significant advantage in 
transfer of an avoidance R over the 
control groups which had given the 
verbal response to a tone 15 times 
or toa light 15 times. 
On the surface the hypothesis 
seems to have been substantiated, 
but at least two points should be ex- 
amined before the conclusion is ac- 
cepted. If the identifying response is 
considered as instrumental in kind, 
then it follows obviously that the 
above-noted transfer effect stands as 
an example of S-R learning. How- 
ever, the generalization that the con- 
cept SPC is in like manner an aspect 
of S-R learning (via mediated stimu- 
lus generalization), although sug- 
gested by, does not necessarily follow 
from a single experimental outcome. 
In fact, if one assumes that the identi- 
fying respor. se acted to “set” the Ss 
to connect the two stimuli, one would 
expect the obtained transfer differ- 
ences to occur. Stated in another 
way, the Wickens and Briggs study 
showed that the mediating response 
is a sufficient condition in SPC. In 
order to show that the response is 
both a necessary as well as a sufficient 
condition to effect mediation, it 
would be essential to eliminate non- 
response induced ‘‘sets’’ as possible 
mediators. An example of the latter 
would be the increase in pronounced- 
ness of the PC stimuli by delimiting 
the quality and quantity of other 
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stimuli available to S (enhancing at- 
tention value, per se). Further, to 
compare MSG and SPC directly, the 
above study should have included 
two experimental groups (pure SPC) 
exposed to the two stimuli minus the 
identifying response. Experimenta- 
tion noted earlier pertinent to the 
role of possible responses during the 
latency period of SPC and MSG will 
be discussed further subsequently. 

One other related set of experi- 
ments concerns mediate association 
and requires consideration as a possi- 
ble verbal S-R analogue to mediated 
stimulus generalization. The general 
principle of mediate association re- 
quires that previous associations be- 
tween two ideas will facilitate the 
establishment of one of these with a 
third hitherto unrelated idea, 
concept took the experimental form 
of learning paired associates. Peters 
(1935) used various pairs of meaning- `` 
ful and nonmeaningful verbal and 
motor tasks to investigate the con- 
cept. The sequences of associations 
frequently involved using the re- 
sponse as the common item (A-B, 
C-B, A-C). Once the stimulus was 
the common element (A-B, A-C, 
B-C) and once the response in the 
first pairing became the stimulus in 
the second (A-B, B-C, A-C). In no 
instance did the ¢ test show the ex- 
pected facilitating effect. The only 
procedure which even approached 
significance in this direction was the 
last one noted above, A-B, B-C, 
A-C, in which Peters used months 
(B), numbers from 1-12 (A), and let- 
ters (C). 

A more recent experiment “by 
Bugelski and Scharlock (1952), with 
the latter procedure and nonsense 
syllables as the learning material, 
Produced positive results. Important 
to note is that here the £ test. agai? 
failed to yield significance’ How- 
ever, the order effect in the expected 


This #& 
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direction was significant. Unfortu- 
nately, the individual results were not 
available in Peters’ article so that the 
order effect could not be tested in his 
data. Nevertheless, if one adopts 
tentatively the suggestion offered by 
Bugelski and Scharlock that the 
order of association is important, the 
phenomenon fits neatly into the 
classical mediating generalization 
framework. The sequence A-B, 
B-C, A-C would be expected to 
yield facilitation of A-C, whereas 
A-B, C-B would not and could be 
considered similar to backward con- 
ditioning.s The SPC data, although 
by no means clearcut, suggest that 
temporal ordering of PC stimuli (and 
thus also of the unobserved re- 
sponses) may not be important. Both 
the IPE and IPC groups in Cop- 
pock’s study (1958) and the ‘‘Back- 
ward” PC group in Silver and Mey- 
er’s experiment (1954) could be con- 
sidered the analogue in SPC to the 
inverted C-B condition. The IPE 
and Backward PC groups showed the 
SPC effect while the IPC group did 
not. Although not conclusive, these 
results provide an indication that 
Sensory preconditioning may not be 
simply an instance of S-R learning 
as is mediated generalization. Cer- 
tainly, a more detailed comparison 
of the temporal parameters governing 
the instances of mediated S-R learn- 
ing and SPC is needed. 

Before going on to theoretical im- 
plications, it would be well to sum- 
marize the empirical findings. At 
this point, SPC seems generally sub- 
stantiated as a phenomenon in learn- 
ing. Further, there are indications 
that the required conditions for its 


3 Razran (1956) to the contrary, notwith- 
standing, recent experimental literature indi- 
cates that so called backward conditioning 1s 
cither an artifact of conditioning procedures 
(Harris, 1941) or an unstable, weak, transient 


effect (Spooner & Kellog, 1947). 


occurrence seem to be little more 
than repeated stimulus contiguity. 
The above analysis hints at a lack of 
importance in temporal relationship 
between PC stimuli. Brogden’s data 
(Brogden & Gregg, 1951) suggest 
that number of repetitions (i.e., 
analogous to Hull’s N) do not oper- 
ate in SPC as in S-R learning. Sim- 
ilarly, Coppock’s study (1958) sug- 
gests that extinction in SPC does not 
follow the usual curve related to num- 
ber of unreinforced CS repetitions. 
In addition, Coppock’s results, those 
of Bahrick (1952, 1953) and of the 
author (1958) reveal that the ex- 
istence of a response during the PC 
period is unimportant for SPC to oc- 
cur. These facts must be taken into 
account when one attempts to class 
sensory preconditioning as an in- 
stance of a given conceptualization 
of learning (i.e., S-S or S-R theory). 


THEORETICAL INTERPRETATIONS 


Most of the experimenters have 
not attempted to theorize about the 
nature of sensory preconditioning 
with the exception of Brogden, how- 
ever, who interpreted his results in 
terms of Guthrian theory; and he hy- 
pothesized an unknown UCRand CR 
to the neutral stimuli. 

Wickens and Briggs (1951) and 
Silver and Meyer (1954), in agree- 
ment on the apparent lack of rein- 
forcement in the SPC situation, have 
attempted an S-R analysis of the 
learning process in terms of ‘‘mediat- 
ing stimulus generalization.” Ac- 
cording to Silver and Meyer, the 
buzzer and light are actually uncon- 
ditioned stimuli which lead to “not 
directly observed” unconditioned re- 
sponses. After frequent pairing, each 
of these stimuli comes to elicit equally 
difficult-to-observe conditioned re- 
sponses. The resultant in transfer 
from this initial cross-conditioning, 
the reader will recall, is that in test 
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trials one should expect positive 
transfer effect. 

Coppock’s S-R analysis (1958) 
differs slightly from the cross-condi- 
tioning approach in that he assumed 
that in accord with conditioning 
principles the temporal relationships 
between S, and S, during PC deter- 
mine which response-complex could 
provide the response-produced stim- 
ulus as a basis for mediation. This 
means, as shown in Fig. 1, that the 

-R mediation process differs þe- 
tween Coppock’s PC and IPC groups. 
As pointed out by Coppock (1958, 
p. 218) the IPC mediator existent 
during training was the response- 
produced stimulus of a CR (Ris) 
which was undergoing extinction dur- 
ing that stage. On the other hand, 
the usual PC group has a UCR as 
a base for the response-produced 
mediator (e.g., Res»). Note that, asa 
result of his traditional S-R analysis. 
Coppock labeled the CR-mediation 
group inverted PC rather than “back- 
ward” PC as did Silver and Meyer. 
His analysis has the advantage in 
that it is less ambiguous to predict 
from the UCR-CR distinction than 
from the cross-conditioning analysis 
that PC>IPC. Further, Coppock 
could predict that the IPE group, 


benefiting from the added CR-medi- 
ation after extinction of PC, should 
show greater transfer than a group 
having undergone simple extinction 
of preconditioning connections (SPE). 
On the other hand, neither approach 
is adequate to account for SPC data 
that reveal Preconditioning inde- 
pendent of the response during the PC 
period (Bahrick, 1953; Coppock, 
1958; Seidel, 1958). 

In formulating his mediation analy- 
sis of SPC, Osgood also abandons 
the concept of reinforcement as a 
necessary condition for learning. In 
fact, after raising the fact of no ex- 
tinction after many secondarily (at 
most) reinforced trials of mere bom- 
bardment by stimuli, he concludes 
that sensory preconditioning pro- 
vides “‘one of the strongest arguments 
against reinforcement theory” (1953, 
p. 462). 

Osgood’s S-R explanation differs 
from the above, however, since he 
Suggests that “a common perceptual 
reaction” (e.g., attentional) is elic- 
ited initially to the novel stimuli. 
“TE one of these .. . is now... con- 
ditioned to a new reaction, the self- 
stimulation produced by the media- 
tion process...” is inferred (p. 461). 
The obvious difficulty with his in- 
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terpretation, to agree with Osgood 
himself, is that nothing of the pro- 
posed process is directly apparent in 
aean s behavior. Osgood is 
ne hor draw upon analogous evi- 
EF ce from conditioning (e.g., Ship- 
y, 1933) to substantiate his point. 
eee, as noted earlier whether or 
pa . analogies are correct awaits 
of SPC experimentation in the areas 
ie ext and MSG to establish as fact 
a xistence of similar stimulus and 
sponse relationships in both types 

of procedure. 
sored implied by the discussion in the 
Pie ine section, there appears to be 
seal inconsistency between any S-R 
SERRI of SPC and the analogous 
don eae as support for media- 
ae Vith reference to Osgood, al- 
CTE in all of his discussion of the 
= a process he states that it is 
San ee of previous instru- 
Ree e pavior it is difficult to see 
Bhat ae a conceptualization could 
pars o SPC. No instrumental re- 
Bae is called for in this paradigm, 
Blend E one differentially rein- 
Bats he it does occur. Further, from 
esee Suised; when autonomic re- 
en mee made consistent or recur- 
ently i the PC pairing, they appar- 
alon Y no influence on the associ- 
eea two stimuli. If it is as- 
aea SaR hat the autonomic responses 
tied important for mediation but 
simpl Pobserved UCR's are, one 
mest egs the question. Why is one 
R response and not the other— 
GRAA istence of which has more cer- 
oe ae through food deprivation 
eE GSR measurement) —im- 
it ST mediation? Furthermore, 
theoretic i inconsistent or at least 
Sgood $ y not parsimonious for 
tion in a AeEept autonomic media- 
eny itin e earning instance and to 
a second situation where it 


as a Sia i 
anefe al possibility to mediate 


An S-S contiguity point of view is 
proposed by Birch and Bitterman 
(1949) who feel, “The results of the 
sensory pre-conditioning experiments 
require us to postulate a process ofaf- 
ferent modification (sensory integra- 
tion)... which takes place inde- 
pendently of need reduction” (p. 
302). They later assert that “the 
latent learning experiment may be 
understood as a complication of the 
sensory pre-conditioning experi- 
menti. V (1951 p300) RON the 
essential condition of sensory integra- 
tion they postulate, “When two af- 
ferent centers are continuously acti- 
vated, a functional relation is estab- 
lished between them such that the 
subsequent innervation of one will 
arouse the other” (p. 358). 

To this writer, the key phrases in 
the above seem to be “functional” 
and “‘such that” since in these words 
lies the linkage between the mediated 
S-R and the afferent integrations. 
These verbal ambiguities lead to the 
ultimate conclusion that the diffi- 
culty in deciding upon the correct 
functional explanation for the medi- 
ating process resolves itself into a 
pseudo-problem for psychology. Per- 
haps neurologists will some day pro- 
vide the answer concerning whether 
or not the central connections are 
between afferent-efferent or afferent- 
afferent neurons. As a start in this 
direction, Harris (1948) has hy- 
pothesized that a type of neural sum- 
mation occurs when intersensory 
stimuli (e.g., visual and auditory) 
are paired. His review of the physio- 
logical evidence led to the hypothesis 
that there is high probability of such 
summation taking place in the mid- 
brain and brain stem. The inter- 
sensory facilitation noted in psycho- 
logical studies could then be ac- 
counted for as the behavioral cor- 
relate of this neural integration. 
Further, through some fractionated 
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intermediary response, common ini- 
tially to both sound and light, sensory 
preconditioning is supposed to occur. 
In this way, Harris attempts to pro- 
vide justification for a neural locus 
of an attentional or perceptual medi- 
ator (similar apparently to that of 
Osgood). The physiological data re- 
viewed by him, however, seem to pro- 
vide an equally plausible basis for an 
S-S er S-R psychology. 

There is one different type of me- 
diated-response hypothesis which de- 
serves mention. Hebb (1949) has 
proposed a neural associationistic 
theory which includes the develop- 
ment of alternate neural routes in 
the CNS as a correlate of perceptual 
learning. The response which he 
gives as an example of a mediator-in 
the formation of a visual percept is 
the scanning eye movement from 
angle to angle along the sides outlin- 
ing a visually presented object. 
Clearly, performing any type of in- 
strumental response can be differen- 
tiated from the mediating response 
in Hebb’s theory. The eye move- 
ment can occur independently of 
what instrumental response the S 
must perform in any given task. In 
this sense, such an independent me- 
diator is also different from Osgood’s 
“detached responses” which are 
stated to be in some measure part of 
previous instrumental behavior. 
Consequently, it seems plausible to 
suggest that, if any type of mediating 
response takes place in sensory pre- 
conditioning, it may be of the Hebb- 
ian variety rather than the usual 
instrumental type seen in mediated 
generalization and mediate associa- 
tion. The rationale for such a pro- 
posal should become more apparent 
in the following section. 


A Comparison of Mediated Generaliza- 
tion and Sensory Preconditioning 


At this point it might be well to 
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note more clearly the rationale which, 
it is felt, forces a cautious approach 
upon any attempt at relating these 
concepts. As was mentioned in thein- 
troduction, one characteristic unique 
to the SPC learning paradigm 1s 
the lack of any response require- 
ment during the latency or critical 
period. Other procedures which have 
been used (i.e., place vs. response, 
latent learning) to test the relative 
merits of S-S and S-R theories all 
require, and sometimes reward, spe- 
cific responses in such a stage. As is 
evident from the Wickens and Briggs 
experiment discussed above, this dis- 
tinction is not readily apparent in 
the analysis of the paradigm for 
mediated stimulus generalization. In- 
deed, in order to better understand 
both SPC and MSG, a step by step 
procedural comparison should be 
helpful. 

Consider specifically the break- 
down in Table 1. To illustrate the 
comparison, reference is made to 
Shipley’s study (1933) on mediated 
stimulus generalization, which Os- 
good (1953) cites as a classic example 
of both MSG and SPC. In order to 
make the comparison more applica- 
ble to S-R learning in general, an 
outline of the Wickens and Briggs 
study (1951) was included in the 
chart. Like SPC the experiments in- 
volved three stages, but the structure 
of these stages seem to be observably 
different from SPC, Shipley first 
paired a CS, (faint light) with a 
definite UCS, (tap-on-cheek) to con- 
dition a CR, (eyeblink), The Wick- 
ens and Briggs procedure. differed 
somewhat from that of Shipley by 
utilizing instrumental learning dur- 
ing the first stage. The Ss were re- 
quired to give a common response 
(“Now”) to the paired CSs (light: 
tone presentation) or to either ¢ 
separately. While these are easily 
identified as straightforward con- 
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ditionin r i . 5 eE 
irado uae ing and DE stimulus, CSa» elicited 
n ad y, such a de CRs, finger withdrawal, in some Ss. 
ee me. @ o aig, ee SPC Wickens and Briggs obtained positive 
curate. Duri k zi e_1 seems inac- transfer effect in both their experi- 
Dered Se REN the preconditioning mental groups (separate or contigu- 
trious ae € simpa exposed to con- Ous presentation of stimuli). In SPC 
light rive such as buzzer and the final stage is a similar transfer 
wisely vi ofore in conditioning test wherein NS: is used to elicit CR. 
al tes I a to be neuial stim- Note that in Shipley’s study, the 
foie i Spo NS). Note that no Wickens and Briggs’ investigation 
andaa or manipulatory, or un- and in MSG experiments in general 
TASR | epa is required or some specified response and condi- 
MeT Pe i ne subject. What is tioning 1s imposed initially. The at- 
MSG et a response is made, unlike temptat generalization to SPC of the 
Betinente PERREN of it by the ex- same type of mediation process rests 
ot an 13 given through reward upon the assumption that some unob- 
Tn the ment. , ; served or unobservable UCR occurs 
Dent e eona stage of his experi- to both NSs. Consequently, as noted 
ger ea. ey conditioned CR: (fin- earlier, Silver and Meyer (1954) and 
UCS, (t rawal) to CS, previously similarly Wickens and Briggs (1951) 
Papen faron.. Wickens and hypothesize that any SPC effect is ex- 
AS kisita awen a similar procedure. plained as a mediated resultant ofa 
A ee PC stimulus type of cross-conditioning between 
ME or as CS; in a similar condition- NS: and NS: established in the pre- 
cedure ee learning pro- conditioning period: In the training 
teint ih ext, Shipley presented the period although only NS, is used as 
ight without further condition- CS;, an entire stimulus complex 1s 


TABLE 1 
CoMPARISON OF PROCEDURES IN MSG anp SPC 
MSG SPC 
St 
‘a s R Reinforce- s R Reinforce- 
ment ment 
1 š ee 
(Shipley) CSUCS, CRi Specified NS-NS: None None 
” Required Apparent 
(Wickens and CSCS: CR: Specified 
ggs) CS, Cs: CR Specified 
2 (sh: 
Shipley) CS. CR: csucs CR: Specified 
; (NSi) 
(Wickens na Specified 
“ERY Si CR: 
3 r 
(Shipley) CS, CR: cs CR: 
mediated — (NS:) mediated = 
pacts and CS: CR: 
a 8s) mediated = 
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presumed present composed of NSi, 
its stimuli derived from its UCR, and 
those from its CR (UCR to NS). 
And, since these stimuli from its CR 
are similar to those produced by the 
UCR of NS, in test trials one should 
expect positive transfer effect. 

It remains an empirical question, 
however, concerning: (a) whether or 
not SPC and MSG are operationally 
distinguishable concepts, and (b) 
whether or not either one or both can 
be subsumed under the principles of 
conditioning. With regard to the 
latter point, there are at least the 
three sources of published data dis- 
cussed previously which appear op- 
posed to a conditioning interpreta- 
tion of SPC. First, there is the find- 
ing that stable backward SPC ex- 
ists—to as great a degree as simul- 
taneous SPC; and, secondly, Brog- 
den (1951) has reported that appar- 
ently in SPC the strength of the pre- 
conditioning asgociation is not a 
function of N (number of PC pair- 
ings). 

Thirdly, the role of the response in 
SPC seems unimportant. Although 
Bahrick’s results noted earlier were 
not definitive for SPC, he did obtain 
a positive transfer effect in his test 
situation for all groups. This gen- 
eralization occurred despite the fact 
that the rats were exposed to PC 
stimuli under hunger and thirst 
motivation, but trained and tested 
on an avoidance problem when sati- 
ated for hunger and thirst. Bahrick’s 
data, as a result, suggest at least two 
difficulties for an S-R interpretation 
of mediation in SPC. The autonomic 
responses present during exposure, 
which might have mediated the 
transfer, were either nonexistent or 
present in only a slight degree during 
training and testing. Even more 
striking, is the fact that the auto- 
nomic processes dominant during 

training and testing (sympathetic 


processes) were opposed to those 
present during the initial pairing of 
the PC stimuli. Despite both condi- 
tions positive transfer occurred. Fur- 
thermore, in the SPC study (Seidel, 
1958) cited earlier, the writer sub- 
stantiated Bahrick’s finding in a de- 


sign which included degrees of sim-' 


ilarity between autonomic responses 
present during preconditioning (the 
exposure period) and those present 
during the training-testing stages. 
As was pointed out in the analysis of 
that experiment, the most probable 
mediating responses must have been 
either the autonomic response-com- 
plex, per se, or that complex com- 
bined with other unobserved re- 
sponses. In either case, differences 
among experimental groups should 
have appeared if response-produced 
mediation were involved, Appar- 
ently, these two experiments indicate 
that even when a given response is 
specifically made consistent with the 
PC stimuli and thereby allowed the 
Opportunity of serving as the basis 
for mediation, it has no effect in the 
SPC paradigm. In addition, the find- 
ing that the GSR in SPC does not 
seem to follow the normal extinction 
curve argues against an S-R inter- 
pretation of SPC. 

The caution advised in attempting 
to subsume SPC under S-R learning 
theory by calling it an example of 
conditioning seems clearly justified. 
From the above data it appears that 
number of repetitions (i.e., N), tem- 
poral order, and specific responses 
have little effect on the establishment 
of stimuli association in SPC. On 
the. other hand, the importance of 
these factors in conditioning is well- 
established empirically. 

_ Still to be considered is the ques: 
tion (a) whether or not SPC and MSG 
are operationally distinguishable con- 
cepts. Returning to the analysis of 
Shipley’s MSG study, it will be re- 
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called that in Stage 1 a faint light 
(CS,) was conditioned to elicit an 
eyeblink (CR:). Since at this point 
the link between light and tap-on- 
cheek (UCS,) was established, pre- 
sumably conditioning provided the 
basis for mediation. In like manner 
Wickens and Briggs established S-R 
connections initially. Ostensibly at 
least the SPC operations (Table 1) 
do not establish such links. In addi- 
tion, since SPC data appear in con- 
tradiction to certain conditioning 
principles, it is implied from the fore- 
going discussion that although MSG 
would fit S-R theory, it should not 

yield data consistent with SPC. 
There is an indirect suggestion of 
such a possibility if one considers the 
Studies (Bugelski & Scharlock, 1952; 
Peters, 1935) on mediate association 
(A-B, B-C, A-C) as a verbal parallel 
to the MSG paradigm. The data 
gathered so far from these studies in- 
dicate that a certain order of presen- 
tation of S and R in each stage is es- 
sential to the attainment of facilita- 
Pen effects (mediate association) in 
SA mojed stage. If it is recalled from 
oe le 1 that MSG involves condi- 
A ning as the basis for mediation, it 
Wan pparenk that a similar type of 
er er (the CS-UCS order) should be 
Erne importance in the achieve- 
(dies of mediated generalization. In 
Re , while this order principle seems 
ia pee both mediate association 
ee mediated stimulus generaliza- 
ae as noted above, it apparently 
ees not hold for SPC. A feature 
anh to the preconditioning pro- 
eee which seems related to this 
aaa is the previously | men- 
Leper’ of any required instru- 
cae or conditioned response dur- 
addita, preconditioning period. In 
E the possible difference 1n 
ana MS parameter governing SPC 
E G, other SPC findings dis- 
sed offer the suggestion that N 


and mediating-response factors are 
not the same either. 

What this over-all comparison of 
MSG and SPC indicates is that, al- 
though both paradigms yield similar 
transfer effects in some instances, 
SPC alone appears governed by a 
different set of laws from that of 
classical conditioning. It is empha- 
sized that this is a tentative working 
hypothesis suggested by both partial 
and indirect sources of data. Whether 
or not S-R concepts are able to ac- 
count for SPC and whether MSG 
and SPC are actually two names fora 
single learning process or reflect dif- 
ferent types of learning await a sys- 
tematic parametric comparison be- 
tween the two concepts. Further- 
more, if learning is a two-stage proc- 
ess as Mowrer has already suggested, 
it may be that such a comparison 
could yield the parameters for these 
factors. At any rate, in the most 
conservative sense, one might simply 
state that the SPC studies have 
given results different from those 
previously gotten in conditioning or 
those implied by any S-R media- 
tional learning hypothesis. 


CONCLUSIONS 


If SPC is to be explained by the 
same principles as classical condi- 
tioning, as Reid has suggested (1952) 
in addition to following the laws of 
conditioning, sensory precondition- 
ing should be present in an organism 
simple enough to make symbolic 
functioning an untenable interpreta- 
tion. From the available compara- 
tive literature reviewed by this 
writer, SPC does seem to exist 1n 
such organisms. 


It is notewort 
Reid used the same apparatus 


preconditioning and training, am 
these authors found that presenting 


the control group with only the test 
stimulus in preconditioning resulted 


hy that Bahrick and 
for 
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in the same degree of transfer for ex- 
perimental and control groups. The 
writer, as well as Silver and Meyer, 
on the other hand, utilized two dis- 
tinctly different pieces of apparatus 
for these conditions; and they ob- 
tained significantly different degrees 
of transfer between experimental 
and control animals. The apparent 
paradox seems obviated if one recog- 
nizes that in the preconditioning 
setting the contiguous sensory stim- 
uli are not limited to those which the 
experimenter has designated. Rather, 
the sensory associations are formed 
among the particular situational cues 
to which the animal attends. These 
sensory associations, thus, consti- 
tute a stimulus complex, in which 
the tone and/or light represent but 
one or two components of the total- 
ity. Thus, it is proposed that the 
important stimuli for the organism 
in the preconditioning situation are 
constituted by the stimulus complex 
to which it attends; and all that the 
experimenter can hope to do is to 
heighten the probability that the 
stimuli in which he is interested will 
be included in the complex of interest 
to the rat. Consequently, there ex- 
ists the need for additional animal 
studies with control of the exposure 
variable and apparatus similarity, 
both of which bear on the subject of 
more definite identification of the 
stimulus. 

At the outset of the paper, the 
initial question asked was whether or 
not sensory preconditioning required 
independent consideration as a phe- 
nomenon in learning. Although the 
data are by no means exhaustive, 
they do suggest that it tentatively 
does require such consideration, In 
addition, regarding the second ques- 
tion of the pertinent laws for sensory 
preconditioning, whether the param- 
eters of preconditioning and S-R 


learning differ or are the same should 
be further investigated in the par- 
adigm specified above. However, it 
is apparent at this point that the role 
of the response in SPC is a minor one 
(Bahrick, 1953; Coppock, 1958; 
Seidel, 1958). 

Concerning the third question of 
the problems posed for the learning 
theorist, one issue clearly defined at 
present is that reinforcement as clas- 
sically understood (Hull, 1943) seems 
to be an unnecessary condition for 
SPC to be effective. The value of 
this contribution is not to be under- 
estimated. Indeed, the very concept 
of reinforcement (drive reduction) as 
developed by Hull and his supporters 
has been the center of a major con- 
troversy in learning theory for many 
years. To this end, the sensory pre- 
conditioning research has proved 
fruitful. This point is epitomized by 
the fact that Osgood (1953), an S-R 
reinforcement theorist, has conceded 
that the SPC data Provide a strong 
case for the elimination of reinforce- 
ment as a necessary condition for 
learning, Further, if the writer’s 
autonomic interpretation of Osgood’s 
mediational analysis of learning is 
correct, the SPC data seem to pose 
difficulties for the latter’s peripheral 
mediation hypothesis. At Present, a 
more tenable approach would be the 
Hebbian-type analysis proposed by 
the writer or the S-S view offered by 
Birch and Bitterman. Finally, if one 
entertains the Possibility for two- 
factor learning, 


parison of MSG and SPC, which at 


esses, should prove fruitful for learn- 
ing theory. Thus, granted the need 
for more research, the sensory pre- 
conditioning paradigm already seems 
to have provided a valuable building 
block for the theoretical development 
of psychology, 
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A NEW PSYCHOPHYSICAL METHOD: METHOD OF TRANS- 
POSITION OR EQUAL-APPEARING RELATIONS 


TADASU OYAMA 
Hokkaido University, Japan 


In Fig. 1 the right half of the 
straight line appears longer than the 
left half, though they are physically 
of the same length; this is the well- 
known Miiller-Lyer illusion. To de- 
termine the amount of illusion, one 
of the traditional psychophysical 
methods is to let the observer adjust 
the length of the right or left half of 
the line until it appears equal to the 
other half, and the difference in 
length between the two halves is sup- 
posed to represent the amount of il- 
lusion. 

But does this procedure give the 
true measure of illusion? When the 
adjustment is complete, the stimulus 
pattern is no longer the same as the 
original; two halves of the line after 
adjustment are different in length, 
whereas in the original Miiller-Lyer 
figure they are the same. In other 
words, the very operation of meas- 
urement changes the stimulus pat- 
tern from Miiller-Lyer figure to some- 
thing else. And we have no assurance 
that what we measure by this method 
is the amount of illusion as it exists 
in the original stimulus pattern. 

There are many other classical 
psychophysical methods besides the 
method of adjustment illustrated 
here, but they are alike in that a pair 
of equivalent stimuli is sought for 
measuring the amount of illusions, 
and the measuring operation inevi- 
tably alters the stimulus pattern. It 
would certainly be preferable to have 
a method of measurement which can 
be applied to the stimulus pattern 

without destroying it. To meet this 
demand, a new psychophysical meth- 
od has been devised in which the 
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Fic. 1. MÜLLER-LYER ILLUSION FIGURE. 


original stimulus pattern is left intact 
while the apparent relation between 
its stimulus parts is measured. The 
observer is asked to find the neutral 
comparison pattern which has the 
same apparent relation between its 
parts as the original stimulus pattern 
has. In other words, the apparent 
relation between the standard pair 
of stimuli is transposed to the com- 
parison pair just as a melody is trans- 
posed from C major to D major. For 
this reason, we may name the new 
method the method of transposition 
or the method of equal-appearing rela- 
tions. This method will be illustrated 


and discussed in the following sec- 
tions. 


SOME APPLICATIONS or THE 
NEw METHOD 
Miiller-Lyer Illusion 


Oyama (1955) used ay! 
tion” pattern shown in F 
instructed his Ss to adjust 
half until it had the same 
ratio to the constant left hi 
ratio of the corresponding 


transposi- 
ig. 2 and 
the right 
apparent 
alf as the 
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Fic. 2. THE TRANSPOsrI 
re TION PATTERN U; 
IN THE EXPERIMENT OF MÜLLER-LYER tee 
SION. (REDRAWN FROM Oyama [1955]) 
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Fic. 3. MÜLLER-LYER ILLUSION AS A FUNC- 
TION OF THE LENGTH AND THE ANGLE OF AR- 
rows. The Solid Lines Represent the Results 
of the Method of Transposition and the 
Dotted Lines Show the Results of the Method 
of Adjustment. 


the Müller-Lyer figure in Fig. 1. 
Seven Ss served under five condi- 
tions employing stimuli of various 
angles and lengths of arrows. The Ss 
made the matches in the traditional 
way and also according to the new 
procedure advocated here. 

Results are shown in Fig. 3. There 
are considerable discrepancies be- 
tween the results of the new and the 
traditional methods. This is only 
natural because the two methods 
measure different stimulus patterns 
as pointed out above. We may say 
that the results of the new method 
represent the Miiller-Lyer illusion 
more directly than do the results of 
the old method though more syste- 
matic analysis of this phenomenon is 
desirable. That the stimulus pattern 
is left intact in this method is its 
major advantage over traditional 
methods. 


Figural After-Effects 


If, immediately after inspection of 
a circle, another circle is presented 
at the same place in the visual field, 
the second circle appears smaller or 
larger than a third circle which is 
physically the same size as the second 
circle but is shown at a neutral place. 
This phenomenon has been named 
figural after-effect. In the traditional 


SHRINKAGE GROWTH 
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method the third circle is used as the 
variable comparison stimulus by 
means of which the amount of figural 
after-effect is measured (Oyama, 
1954; Sagara & Oyama, 1957). Our 
new method can also replace this 
method of measuring figural after- 
effects. 

Oyama (1955) prepared 10 inspec- 
tion cards, on each of which was 
drawn a circle of variable size on the 
right side of a fixation mark. On the 
test card were two circles of the same 
size on the right and left sides of a 
fixation mark. In addition to these 
cards, he made a series of ‘‘trans- 
position” cards, on each of which 
were drawn a circle of constant di- 
ameter on the left side and another 
circle on the right whose diameter 
was varied by 1 mm. steps. Subjects 
were required to memorize the rela- 
tion between the apparent sizes of 
the two circles in the test card at a 
moment immediately after the in- 
spection, and to choose from ‘‘trans- 
position” cards in his hand. a card 
which had the same apparent rela- 
tion between its two circles as the cir- 
cles had on the test card. 

Results are shown in Fig. 4 for 
both the new and the traditional 
methods with the same five Ss. The 
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Fic, 4. FIGURAL AFTEREFFECT AS A FUNC- 
TION OF THE SIZE OF INSPECTION-CiRcLE. The 
solid line represents the results of the method 
of transposition and the dotted line indicates 
the results of the method of constant stimuli. 
The diameter of test-circle was 4 cm in these 
experiments. (Redrawn from Oyama [1955]) 
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two curves in this figure are similar 
enough to be substituted for one an- 
other. To obtain practically the same 
results, however, the new method re- 
quired only two experimental ses- 
sions of about a half hour for each S, 
whereas the traditional method re- 
quired 10 sessions. This means that 
the new method gave more informa- 
tion per trial than did the traditional 
method and this may be counted as 
one of the advantages of the former 
over the latter. 

Wertheimer (1954) had already 
used practically the same procedure 
in his study of figural after-effect, 


Size Constancy 


In the traditional experiment on 
size constancy, the experimenter pre- 
sents two circles at different dis- 
tances from the observer, One circle 
has a constant size (the standard) 


standard, 

Makino (1956) employed the 
method of transposition instead of 
the traditional method in his study 
of size constancy, He Presented two 
physically equal circles, One at a 


cards in their hands. His results were 
very clear-cut as shown in Fig. 5. The 


cal distance. Results obtained with 
binocular observation were fitted by 
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Fic. 5. PERCEIVED Size AS A FUNCTION op 
THE OBSERVATION DISTANCE, The Solid Line 
Indicates the Results of Binocular Observa- 
tion and the Dotted Line Represents Those 
of Monocular Observation. (Redrawn from 
Makino [1956]) 


the equation: =1.68D-0.35 and the 
results with monocular observation 
by the equation: $=1.77D-018, in 


studied size constancy in stereoscopic 


°sram of two white balls on a table, 


) S lin such a 
Situation, his analysis had to remain 
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stereoscopic ability as measured by 
the Umezu-Shimazu test (1952). 


SOME THEORETICAL 
CONSIDERATIONS 


It has been frequently argued that 
the responses in the orthodox psy- 
chophysical methods ought to be re- 
stricted to the two categories of yes 
and no, or to the three categories of 
“greater,” “less,” and “doubtful.” 
This principle holds in the new 
method as well. The observer com- 
pares the standard pair of stimuli 
with the “transposition” (compari- 
son) pair of stimuli, and judges 
whether the apparent ratio or differ- 
ence between the two components in 
the former pair is larger or smaller 
than that in the latter. In some cases, 
he reports such judgments explicitly, 
and, in other cases, he makes the ad- 
justments or choices of the transposi- 
tion stimuli according to his implicit 
judgments. In these respects, the 
new method has nothing new, except 
that the judgments are concerned 
with the apparent ratio or difference 
in each pair rather than the apparent 
magnitude on each stimulus. 

Transposition is the sole new re- 
quirement in the new method. The 
observer is asked to transpose the 
phenomenal relation, i.e., the appar- 
ent ratio, difference, or some other 
relation from one stimulus pattern 
to another. Gestalt psychologists 
claimed that man is able to transpose 
phenomenal relations in many per- 
ceptual functions just as he can trans- 
pose a: mélody (Köhler, 1929). For 
example, a visual figure remains the 
same in shape, regardless of its 


‘brightness, location, or size just as a 


melody remains the same when it is 
played in different keys. Die Trans- 
ponierbarkeit, the possibility of trans- 
position, in the perceptual dimen- 
sion, is the only assumption made in 
our method. There is no other re- 


quirement such as fractionation, mul- 
tiplication, or direct estimation of 
apparent magnitude of stimuli (Ste- 
vens, 1957). 

The psychophysical methods may 
be classified in many ways. They 
may be classified according to the 
mode of presentation or control of 
stimuli, as in the method of adjust- 
ment, the method of limits, or the 
method of constant stimuli. Or, they 
may be classified according to the ob- 
ject to be measured, as in the method 
of just noticeable differences, the 
method of equivalents, or the method 
of equal sense distances. Our new 
method may be called the method of 
equal-appearing relations by the lat- 
ter criterion, but it does not belong 
in the former classification since it 
bears no intrinsic relationship to any 
method of presentation or control of 
stimuli. Sometimes it is combined 
with the method of adjustment as il- 
lustrated in treatment of the Müller- ` 
Lyer illusion, and sometimes with the 
method of constant stimuli, as in 
Makino’s experiment on size con- 
stancy. It may well be combined 
with the method of limits or other 
psychophysical procedures. 

If we do not wish to use the ambig- 
uous term “relation,” we may call 
the new method the method of equal- 
appearing differences, or the method 
of equal-appearing ratios as the case 
may be. However, such names should 
not imply psychological interval or” 
ratio scales as the basis of measure- 
ment in contrast with the method of 
bisection or the method of fractiona- 
tion (Stevens, 1951). In our method, 
the measurements are made on the 
physical scales of the transposition 
stimuli and the results are repre- 
sented in physical units. 

If we interpret “the method of 
transposition” or ‘‘the method of 
equal-appearing relations’ in a 
broader sense, some of the traditional 
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methods may be included in this 
method. A group of psychophysical 
methods which are called the method 
of equal sense distances, the method 
of equal-appearing intervals, or the 
method of bisection are in a sense 
variations of the method of equal- 
appearing relations, if we under- 
stand sense distance, interval, etc., as 
a relation between two stimulus lim- 
its. When we seek by these methods 
the brightness x which appears half- 
way between two brightnesses a and 
b, we equate the apparent interval 
between a and x to that of x and b. 
In other words, we transpose the 
relation between a and x to the rela- 
tion between x and b. In this sense, 
these methods are variations of the 
method of equal-appearing relations 
or the method of transposition. 
More recently, some investigators 
have made intermodal transposi- 
tions of apparent ratios. For in- 
stance, J. C. Stevens asked his Ss to 
adjust the loudness of the second of 
two noises so that the apparent 
ratio of the two sounds equaled the 
apparent brightness ratio between 
two luminous targets (Stevens, 1957). 
This procedure should be called the 
method of equal-appearing ratios, 
and consequently it is a kind of 
method of equal-appearing relations. 
When one of the perceptual dimen- 
sions in the intermodal transposition 
is the apparent length of a straight 
line and the observer is instructed to 
mark a point dividing the line into 
two parts whose ratio appears equal 
to that of two magnitudes in another 
psychological dimension, the situa- 
tion becomes just the same as that of 
a continuous rating scale. Therefore, 
it is also possible to classify such rat- 
ing methods with the method of 
transposition in a broad sense. 
Even such a group as the so-called 
method of fractionation (Geiger & 
Firestone, 1933), the method of mul- 


tiplication (Hanes, 1949), and the 
constant sum method (Metfessel, 
1947) could be regarded as examples 
of our method. In these methods, the 
S is required to translate the appar- 
ent ratio between two given stimuli 
into a number, or to divide a number 
to express a perceived ratio. Num- 
bers constitute a ratio scale in mathe- 
matics, but we are not sure yet if 
they constitute a ratio scale in the 
psychological world as well. Accord- 
ingly it is better to understand such 
“translation” as an intermodal trans- 
position between two perceptual di- 
mensions, one corresponding to the 
stimulus continuum, and the other 
to the verbally expressed numbers, 


SUMMARY 


In many experiments dealing with 
perceptual phenomena, investigators 
try to find a stimulus which appears 
equal toa standard stimulus. It often 
happens, however, that the procedure 
involved in finding 
stimulus al 
so that the 
on the origi 


avoid this difficulty, a new 


ones is that it 


od, the apparent 
relation between the components of 


a stimulus pattern js observed and 
the same relation is sought in a 
neutral stimulus pattern. In other 
words, the apparent relation is trans- 
posed to a different group of stimulus 
elements, just as a melody is trans- 
posed in different keys. 

This method was applied to the 
measurement of the Miiller-Lyer illu- 
sion, figural after-effect, and size 
constancy, and the results revealed 
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that the new method had additional 
advantages over traditional methods, 
viz., greater amount of information 
per trial, economy of time, and light- 
ening the task of the S. 

This method will find its most use- 
ful application in the measurement 
of strongly structured, brief, and 
hard to reproduce or continuously 
changing perceptual phenomena. 

The new method requires no more 
ability on the part of Ss than to use 


the same categories that are used in 
traditional psychophysical methods. 
It makes no assumption of underly- 
ing interval or ratio scales. In these 
respects, it is not different from the 
traditional methods. 

If we interpret the method of trans- 
position in a broad sense, it includes 
the method of equal sense distances, 
some rating scale methods, and also 
so-called ratio estimation or ratio 
production methods. 
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ERRATUM 
In the article “A Learning Theory Approach to Research in Schizophrenia” 
by Sarnoff A. Mednick (Psychol. Bull., 1958, 55, 316-327), Reference 17 
should read: Dunn, W. L., JR. Visual di 


scrimination of schizophrenic sub- 
jects as a function of stimulus meaning. J. Pers., 1954, 23, 48-64. 
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NT AND DISCRIMINANT VALIDATION BY THE 
MULTITRAIT-MULTIMETHOD MATRIX? 


DONALD T. CAMPBELL 


Northwestern University 
anp DONALD W. FISKE 
University of Chicago 


In the cumulative experience with 
measures of individual differences 
over the past 50 years, tests have 
been accepted as valid or discarded 
as invalid by research experiences of 
many sorts. The criteria suggested in 
this paper are all to be found in such 
cumulative evaluations, as well as in 
the recent discussions of validity. 
These criteria are clarified and imple- 
mented when considered jointly in 
the context of a multitrait-multi- 
method matrix. Aspects of the valida- 
tional process receiving particular 
emphasis are these: 

1. Validation is typically conver- 
gent, a confirmation by independent 
measurement procedures. Independ- 
ence of methods is a common denom- 
inator among the major types of 


validity (excepting content validity) 


* The new data analyses reported in this 
paper were supported by funds from the 
Graduate School of Northwestern University 
and by the Department of Psychology of the 
University of Chicago. We are also indebted 
to numerous colleagues for their thoughtful 
criticisms and encouragement of an earlier 
draft of this paper, especially Benjamin S. 
Bloom, R. Darrell Bock, Desmond S. Cart- 
wright, Loren J. Chapman, Lee J. Cronbach, 
Carl P, Duncan, Lyle V. Jones, Joe Kamiya, 
Wilbur L. Layton, Jane Loevinger, Paul E. 
Meehl, Marshall H. Segall, Thornton B. Roby, 
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insofar as they are to be distinguished 
from reliability. 

2. For the justification of novel 
trait measures, for the validation of 
test interpretation, or for the estab- 
lishment of construct validity, dis- 
criminant validation as well as con- 
vergent validation is required. Tests 
can be invalidated by too high cor- 
relations with other tests from which 
they were intended to differ. 

3. Each test or task employed for 
measurement purposes is a trait- 
method unit, a union of a particular 
trait content with measurement pro- 
cedures not specific to that content. 
The systematic variance among test 


‘scores can be due to responses to the 


measurement features as well as re- 
sponses to the trait content. 

4. In order to examine discrim- 
inant validity, and in order to esti- 
mate the relative contributions of 
trait and method variance, more than 
one trait as well as more than one 
method must be employed in the vali- 
dation process. In many instances it 
will be convenient to achieve this 
through a multitrait-multimethod 
matrix. Such a matrix presents all of 
the intercorrelations resulting when 
each of several traits is measured by 
each of several methods. 

To illustrate the suggested valida- 
tional process, a synthetic example is 
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TABLE 1 
A SYNTHETIC MULTITRAIT-MULTIMETHOD MATRIX 
Method 1 Method 2 Method 3 
Traits Ai Bı Ci Az Bz Ca A: B: Cs 
5 A (.89) 
Method 1 Bı (.89) 
(ar (.76) 
As (.93) 
Method 2 Be 
Ce 
Az 


(.94) 
(.92) 


(.85) 


Note.—The 


of values in parentheses. Each heterotrait-monomethod 
heteromethod triangle is enclosed by a broken line. 


presented in Table 1. This illustra- 
tion involves three different traits, 
each measured by three methods, 
generating nine separate variables. It 
will be convenient to have labels for 
various regions of the matrix, and 
such have been provided in Table 1. 
The reliabilities will be spoken of in 
terms of three reliability diagonals, 
one for each method. The reliabilities 
could also be designated as the mono- 
trait-monomethod values. Adjacent 
to each reliability diagonal is the 
heterotrait-monomethod triangle. The 
reliability diagonal and the adjacent 
heterotrait-monomethod triangle 
make up a monomethod block. A heter- 
omethod block is made up of a validity 
diagonal (which could also be desig- 
nated as monotrait-heteromethod 
values) and the two heterotrait-hetero- 
method triangles lying on each side of 
it. Note that these two heterotrait- 


validity diagonals are the three sets of italicized values. The reliabilit 


y diagonals are the three sets 
triangle is enclosed by a solid line, Each heterotrait- 


heteromethod triangles are not iden- 
tical. 

In terms of this diagram, four as- 
pects bear upon the question of valid- 
ity. In the first place, the entries in 
the validity diagonal should be sig- 
nificantly different from zero and 
sufficiently large to encourage further 
examination of validity. This re- 
quirement is evidence of convergent 
validity. Second, a validity diagonal 
value should be higher than the val- 
ues lying in its column and row in the 
heterotrait-heteromethod triangles. 

hatis, a validity value for a variable 
should be higher than the correlations 
obtained between that variable and 
any other variable having neither 
trait nor method in common. This 
requirement may seem so mi 
and so obvious as to not nee 
yet an inspection of the 
shows that it is frequently 


nimal 
d stating, 
literature 
not met, 
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and may not be met even when the 
validity coefficients are of substantial 
size. In Table 1, all of the validity 
values meet this requirement. A 
third common-sense desideratum is 
that a variable correlate higher with 
an independent effort to measure the 
same trait than with measures de- 
signed to get at different traits which 
happen to employ the same method. 
For a given variable, this involves 
comparing its values in the validity 
diagonals with its values in the heter- 
otrait-monomethod triangles. For 
variables Aj, Bi, and Ci, this require- 
ment is met to some degree. For the 
other variables, As, A; etc., it is not 
met and this is probably typical of 
the usual case in individual differ- 
ences research, as will be discussed in 
what follows. A fourth desideratum 
is that the same pattern of trait in- 
terrelationship be shown in all of the 
heterotrait triangles of both the mon- 
omethod and heteromethod blocks. 
The hypothetical data in Table 1 
meet this requirement to a very 
marked degree, in spite of the dif- 
ferent general levels of correlation in- 
volved in the several heterotrait tri- 
angles. The last three criteria pro- 
vide evidence for discriminant va- 
lidity. 

Before examining the multitrait- 
multimethod matrices available in 
the literature, some explication and 
justification of this complex of re- 
quirements seems in order. 

Convergence of independent methods: 
the distinction between reliability and 
validity. Both reliability and validity 
concepts require that agreement be- 
tween measures be demonstrated. A 
common denominator which most 
validity concepts share in contradis- 
tinction to reliability is that this 
agreement represent the convergence 
of independent approaches. The con- 
cept of independence is indicated by 


such phrases as “external variable,” 
“criterion performance,” “behavioral 
criterion” (American Psychological 
Association, 1954, pp. 13-15) used in 
connection with concurrent and pre- 
dictive validity. For construct valid- 
ity it has been stated thus: ‘‘Numer- 
ous successful predictions dealing 
with phenotypically diverse ‘criteria’ 
give greater weight to the claim of 
construct validity than do... pre- 
dictions involving very similar be- 
havior” (Cronbach & Meehl, 1955, p. 
295). The importance of independ- 
ence recurs in most discussions of 
proof. For example, Ayer, discussing 
a historian’s belief about a past 
event, says “if these sources are 
numerous and independent, and if 
they agree with one another, he will 
be reasonably confident that their ac- 
count of the matter is correct” (Ayer, 
1954, p. 39). In discussing the man- 
ner in which abstract scientific con- 
cepts are tied to operations, Feigl 
speaks of their being “fixed” by “tri- 
angulation in logical space” (Feigl, 
1958, p. 401). 

Independence is, of course, a mat- 
ter of degree, and in this sense, relia- 
bility and validity can be seen as re- 
gions on a continuum. (Cf. Thur- 
stone, 1937, pp. 102-103.) Reliability 
is the agreement between two efforts 
to measure the same trait through 
maximally similar methods. Validity 
is represented in the agreement be- 
tween two attempts to measure the 
same trait through maximally differ- 
ent methods. A split-half reliability 
is a little more like a validity coeffi- 
cient than is an immediate test-retest 
reliability, for the items are not quite 
identical. A correlation between 
dissimilar subtests is probably a reli- 
ability measure, but is still closer to 
the region called validity. 

Some evaluation of validity can 
take place even if the two methods 
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are not entirely independent. In 
Table 1, for example, it is possible 
that Methods 1 and 2 are not en- 
tirely independent. If underlying 
Traits A and B are entirely inde- 
pendent, then the .10 minimum cor- 
relation in the heterotrait-hetero- 
method triangles may reflect method 
covariance. What if the overlap of 
method variance were higher? All 
correlations in the heteromethod 
block would then be elevated, includ- 
ing the validity diagonal. The hetero- 
method block involving Methods 2 
and 3 in Table 1 illustrates this. The 
degree of elevation of the validity 
diagonal above the heterotrait-heter- 
omethod triangles remains compa- 
rable and relative validity can still be 
evaluated. The interpretation of the 
validity diagonal in an absolute fash- 
ion requires the fortunate coincidence 
of both an independence of traits 
and an independence of methods, 
Tepresented by zero values in the 
heterotrait-heteromethod 
But zero values could al 
through a combination of negative 
correlation between traits and posi- 
tive correlation between methods, or 
the reverse. In practice, perhaps all 
that can be hoped for is evidence for 
relative validity, that is, for common 
variance specific to a trait, above and 
beyond shared method variance, 
Discriminant validation. While the 
usual reason for the judgment of in- 
validity is low correlations in the 
validity diagonal (e.g., the Downey 
Will-Temperament Test [Symonds, 
1931, p. 337ff]) tests have also been 
invalidated because of too high cor- 
relations with other tests purporting 
to measure different things. The 
classic case of the social intelligence 
tests is a case in point. (See below 
and also [Strang, 1930; R. Thorndike, 
1936].) Such inyalidation occurs 
when values in the heterotrait-hetero- 


triangles, 
so occur 


method triangles are as high as those 
in the validity diagonal, or even 
where within a monomethod block, 
the heterotrait values are as high as 
the reliabilities. Loevinger, Gleser, 
and DuBois (1953) have emphasized 
this requirement in the development 
of maximally discriminating subtests. 

When a dimension of personality is 
hypothesized, when a construct is 
Proposed, the proponent invariably 
has in mind distinctions between the 
new dimension and other constructs 
already in use. One cannot define 
without implying distinctions, and 
the verification of these distinctions 
is an important Part of the valida- 
tional process, In discussions of con- 
struct validity, it has been expressed 
in such terms as “from this point of 
view, a low correlation with athletic 
ability may be just as important and 
encouraging as a high correlation 
with reading comprehension” (APA, 
1954, p. 17), 

The test as a trait-method unit. In 
any given Psychological measuring 
device, there are certain features or 
stimuli introduced specifically to 
represent the trait that it is intended 
to measure. There are other features 
which are characteristic of the 
method being employed, features 
which could also be present in efforts 
to measure other quite different 
traits. The test, or rating scale, or 
other device, almost inevitably elicits 
Systematic variance in response due 
to both groups of features, To the ex- 
tent that irrelevant method variance 
Contributes to the scores obtained, 
these scores are invalid. 

This source of invalidity was first 
noted in the “halo effects” found in 
ratings (Thorndike, 1920), Studies 
of individual differences among lab- 
oratory animals resulted in the recog- 
nition of “apparatus factors,” usy- 
ally more dominant than Psychologi- 
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cal process factors (Tryon, 1942). 
For paper-and-pencil tests, methods 
variance has been noted under such 
terms as ‘“‘test-form factors” (Ver- 
non: 1957, 1958) and “response sets” 
(Cronbach: 1946, 1950; Lorge, 1937). 
Cronbach has stated the point partic- 
ularly clearly: ‘“The assumption is 
generally made...that what the 
test measures is determined by the 
content of the items. Yet the final 
score . . . is a composite of effects re- 
sulting from the content of the item 
and effects resulting from the form 
of the item used” (Cronbach, 1946, 
p. 475). “Response sets always lower 
the logical validity of a test... . 
Response sets interfere with infer- 
ences from test data’’ (p. 484). 

While E. L. Thorndike (1920) was 
willing to allege the presence of halo 
effects by comparing the high ob- 
tained correlations with common 
sense notions of what they ought to 
be (e.g., it was unreasonable that a 
teacher's intelligence and voice qual- 
ity should correlate .63) and while 
much of the evidence of response set 
variance is of the same order, the 
clear-cut demonstration of the pres- 
ence of method variance requires 
both several traits and several meth- 
ods. Otherwise, high correlations be- 
tween tests might be explained as due 
either to basic trait similarity or to 
shared method variance. In the 
multitrait-multimethod matrix, the 
presence of method variance is indi- 
cated by the difference in level of cor- 
relation between the parallel values 
of the monomethod block and the 
heteromethod blocks, assuming com- 
parable reliabilities among all tests. 
Thus the contribution of method var- 
lance in Test A; of Table 1 is indi- 
cated by the elevation of ra,n, above 
TAB, l.e., the difference between .51 
and .22, etc. 

The distinction between trait and 


method is of course relative to the 
test constructor’s intent. What is an 
unwanted response set for one tester 
may bea trait for another who wishes 
to measure acquiescence, willingness 
to take an extreme stand, or tendency 
to attribute socially desirable attri- 
butes to oneself (Cronbach: 1946, 
1950; Edwards, 1957; Lorge, 1937). 


MULTITRAIT-MULTIMETHOD MA- 
TRICES IN THE LITERATURE 


Multitrait-multimethod matrices 
are rare in the test and measurement 
literature. Most frequent are two 
types of fragment: two methods and 
one trait (single isolated values from 
the validity diagonal, perhaps ac- 
companied by a reliability or two), 
and heterotrait-monomethod tri- 
angles. Either type of fragment is 
apt to disguise the inadequacy of our 
present measurement efforts, particu- 
larly in failing to call attention to the 
preponderant strength of methods 
variance. The evidence of test valid- 
ity to be presented here is probably 
poorer than most psychologists would 
have expected. 

One of the earliest matrices of this 
kind was provided by Kelley and 
Krey in 1934. Peer judgments by 
students provided one method, scores 
on a word-association test the other. 
Table 2 presents the data for the four 
most valid traits of the eight he em- 
ployed. The picture is one of strong 
method factors, particularly among 
the peer ratings, and almost total in- 
validity. For only one of the eight 
measures, School Drive, is the value 
in the validity diagonal (.16!) higher 
than all of the heterotrait-hetero- 
method values. The absence of dis- 
criminant validity is further indi- 
cated by the tendency of the values 
in the monomethod triangles to ap- 
proximate the reliabilities. ‘ 

An early illustration from the ani- 
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TABLE 2 
PERSONALITY TRAITS OF SCHOOL CHILDREN FROM KELLEY’s Srupy 
(N=311) 
Peer Ratings Association Test 
Ai B, Cı D: As Bo Ca D: 
Peer Ratings 
Courtesy Ai (.82) 
Honesty Bı -74 (.80) 
Poise Cı -63 65  (.74) 
School Drive D -76 -78 -65  (.89) 
Association Test 
Courtesy Az ahs, 14 -10 -14 (.28) 
Honesty Bz -06 ale -16 -08 21 3R 
Poise Ca -01 -08 -10 02 -19 37 (.42) 
School Drive Dz A2 -15 -14 .16 cers -32 -18  (.36) 


mal literature comes from Anderson’s measure was Pre-sex-opportunity, the 
(1937) study of drives, Table 3 pre- 


activity wheel Post-opportunity. 
sents a sample of his data. Once N igh general level 


again, the highest Correlations are of heterotrait-heterom 
found among different constructs i 
from the same method, showing the 
dominance of apparatus or method 
factors so typical of the whole field of 
individual differences. The validity the methods would seem about as in- 
diagonal for hunger is higher than the uld be likely to 

cteroconstruct-heteromethod val- achieve. The predominance of an ap- 
ues. The diagonal value for sex has e activity wheel 
not been itali 


; ualicized as a validity js evident from the fact that the cor- 
coefficient since the obstruction box 


TABLE 3 


MEAsurrs OF Drives FROM ANDERSO: 


N’s DATA 
(N=50) 


Obstruction Box Activity Wheel 


Ay B, C Az B2 Cy 

Obstruction Box 

Hunger Ay (.58) 

Thirst Bi A cee 

Sex Cı -46 -70 J 
Activity Wheel 

Hunger Az -48 -31 -37 (.83) 

Thirst B: +35 133 +43 87 (.92) 

Post Sex Co wo Erd 44 69 .78 Cs 

Note.—Empty 


Parentheses appear in this and subsequent tables where no appropriate reliability estimates are 
reported in the original paper. 
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TABLE 4 


SOCIAL INTELLIGENCE AND MENTAL ALERTNESS SUBTEST INTERCORRELATIONS FROM 
THORNDIKE’S DATA 


(N=750) 


Compre- 


hension Vocabulary 


Memory 


A B A: B: A: B; 


Memory 


Social Intelligence (Memory for Names & Faces) Ai ( ) 


Mental Alertness (Learning Ability) 


Comprehension 
Social Intelligence (Sense of Humor) 
Mental Alertness (Comprehension) 


Vocabulary 
Social Intelligence (Recog. of Mental State) 
Mental Alertness (Vocabulary) 


Br 3E J 


Ag “30! r A 


Bi .29 .38 .48( ) 
As «43 aS AE Sd. CRED) 
B, .30 .58 .40 .48 .47 ( ) 


(.87) is of the same magnitude as 
their test-retest reliabilities (.83 and 
-92 respectively). 

R. L. Thorndike’s study (1936) of 
the validity of the George Washing- 
ton Social Intelligence Test is the 
classic instance of invalidation by 
high correlation between traits. It in- 
volved computing all of the intercor- 
relations among five subscales of the 
Social Intelligence Test and five sub- 
scales of the George Washington 
Mental Alertness Test. The model of 
the present paper would demand that 
each of the traits, social intelligence 
and mental alertness, be measured by 
at least two methods. While this full 
Symmetry was not intended in the 
study, it can be so interpreted with- 
out too much distortion. For both 
traits, there were subtests employing 
acquisition of knowledge during the 
testing period (i.e., learning or mem- 
ory), tests involving comprehension 
of prose passages, and tests that in- 
volved a definitional activity. Table 4 
shows six of Thorndike’s 10 variables 
arranged as a multitrait-multimethod 
matrix. If the three subtests of the 
Social Intelligence Test are viewed 


as three methods of measuring social 
intelligence, then their intercorrela- 
tions (.30, .23, and .31) represent 
validities that are not only lower than 
their corresponding monomethod val- 
ues, but also lower than the hetero- 
trait-heteromethod correlations, pro- 
viding a picture which totally fails to 
establish social intelligence as a sep- 
arate dimension. The Mental Alert- 
ness validity diagonals (.38, .58, and 
.48) equal or exceed the monomethod 
values in two out of three cases, and 
exceed all heterotrait-heteromethod 
control values. These results illus- 
trate the general conclusions reached 
by Thorndike in his factor analysis of 
the whole 10X10 matrix. 

The data of Table 4 could be used 
to validate specific forms of cognitive 
functioning, as measured by the dif- 
ferent “methods” represented by 
usual intelligence test content on the 
one hand and social content on the 
other. Table 5 rearranges the 15 val- 
ues for this purpose. The mono- 
method values and the validity diag- 
onals exchange places, while the 
heterotrait-heteromethod control co- 
efficients are the same in both tables. 


88 D. T. CAMPBELL AND D. W. FISKE 


TABLE 5 


Memory, COMPREHENSION, AND VOCABULARY MEASURED WITH 
SOCIAL AND ABSTRACT CONTENT 


Social Content Abstract Content 


Ar By G A: Be G3 


Social Content 
arene (Memory for Names and Faces) 
Comprehension (Sense of Humor) 
Vocabulary (Recognition of Mental State) 


Abstract Content 
Memory (Learning Ability) 
Comprehension 
Vocabulary 


Ai ( 

Bı -30 ( ) 

1 E23 310)" ) 

Az oi. SI ES 

B2 29 .48 .35 SEY) 
C: 30 .40 .47 58.48 ( ) 


As judged against these latter values, 
comprehension (.48) and vocabulary 
(.47), but not memory (.31), show 
some specific validity. This trans- 
mutability of the validation matrix 
argues for the comparisons within the 
heteromethod block as the most gen- 
erally relevant validation data, and 
illustrates the potential interchange- 
ability of trait and method com- 
ponents. 

Some of the correlations in Chi’s 
(1937) prodigious study of halo effect 
in ratings are appropriate to a multi- 
trait-multimethod matrix in which 
each rater might be regarded as rep- 
resenting a different method. While 
the published report does not make 
these available in detail because it 
employs averaged values, it is appar- 
ent from a comparison of his Tables 
IV and VIII that the ratings gen- 
erally failed to meet the requirement 
that ratings of the same trait by dif- 
ferent raters should correlate higher 
than ratings of different traits by the 
same rater. Validity is shown to the 
extent that of the correlations in the 
heteromethod block, those in the 
validity diagonal are higher than the 
average heteromethod-heterotrait 
values. 

A conspicuously unsuccessful mul- 


titrait-multimethod matrix is pro- 
vided by Campbell (1953, 1956) for 
rating of the leadership behavior of 


officers by themselves and by their 
subordinates, 


A study of attitudes toward au- 
thority and nonauthority figures by 
Burwen and Campbell (1957) con- 
tains a complex multitrait-multi- 


ethod variance was Strong for most 
of the Procedures in this study. 
Where validity was found, it was 
Primarily at the level of validity 
diagonal values higher than hetero- 
trait-heteromethod values. As il- 
lustrated in Table 6, attitude toward 
father showed this kind of validity, as 
did attitude toward peers to a lesser 
degree. Attitude toward boss showed 
no validity. There 
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TABLE 6 


ATTITUDES TOWARD FATHER, Boss, AND PEER, AS MEASURED BY 
INTERVIEW AND CHECK-LisT OF DESCRIPTIVE TRAITS 


Interview Trait Check-List 
At Bı Cı As B: Ca 
Interview 
(N=57) 
Father Ai CH) 
Boss Be -64 GH 
Peer Cı .65 -76 (9) 
Trait Check-List 
(N=155) 
Father A 40 -08 09 (.24) 
Boss Be -19 —.10 — .03 -23 (.34) 
Peer C: -27 ll 23 21 45 (.55) 


.64 correlation between father and lustrating the assessment of two 
boss as measured by interview might traits by four different methods. For 
have seemed to confirm the hypothe- all measures but one, the highest cor- 
sis had they been encountered in iso- relation is the apparatus one, ic. 
lation. with the other trait measured by the 

Borgatta (1954) has provided a same method rather than with the 
complex multimethod study from same trait measured by a different 
which can be extracted Table 7, il- method. Neither of the traits finds 


TABLE 7 


MULTIPLE MEASUREMENT OF Two SOCIOMETRIC TRAITS 
(N=125) 


Sociometric Observation 


Group In- Role 


by Others by Self teraction Playing 


A B Aq B: A: B: A Bs 


Sociometric by Others 


Popularity A (Cam) 

Expansiveness Bi AG, -) 
Sociometric by Self 

Popularity As On 18) WG is 2) 

Expansiveness B2 107 08 1-32) (S) 
Dbservation of Group Interaction 

ay As 25.18 «6.26 «1 

*xpansiveness Ba 2g IZ ie t28 et) 84 ( ) 


Observation of Role Playing 
Eopularity A 2AA e Oi GOs S8 ACARD 
Xpansiveness Bı 32552 1226) 9.05; 166.) <7O" sion 
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any consistent validation by the re- 
quirement that the validity diagonals 
exceed the heterotrait-heteromethod 
control values. As a most minimal 
requirement, it might be asked if the 
sum of the two values in the validity 
diagonal exceeds the sum of the two 
control values, providing a compari- 
son in which differences in reliability 
or communality are roughly par- 
tialled out. This condition is achieved 
at the purely chance level of three 
times in the six tetrads. This matrix 
provides an interesting range of 
methodological independence. The 
two ‘‘Sociometric by Others” meas- 
ures, while representing the judg- 
ments of the same set of fellow par- 
ticipants, come from distinct tasks: 
Popularity is based upon each par- 
ticipant’s expression of his own 
friendship preferences, while Ex- 
Pansiveness is based upon each par- 
ticipant’s guesses as to the other par- 
ticipant’s choices, from which has 
been computed each Participant’s 
reputation for liking lots of other per- 
sons, i.e., being “expansive.” 
with this considerable indepe 
the evidence for a method factor is 
relatively low in comparison with the 
observational procedures. Similarly, 
the two “Sociometric by Self” meas- 
ures represent quite separate tasks, 
Popularity coming from his estimates 
of the choices he will receive from 
others, Expansiveness from the num- 
ber of expressions of attraction to 
others which he makes on the socio- 
metric task. In contrast, the meas- 
ures of Popularity and Expansiveness 
from the observations of group inter- 
action and the role playing not only 
involve the same specific observers, 
but in addition the observers rated 
the pair of variables as a part of the 
same rating task in each situation. 
The apparent degree of method vari- 
ance within each of the two observa- 


In line 
ndence, 


tional situations, and the apparent 
sharing of method variance between 
them, is correspondingly high. 

In another paper by Borgatta 
(1955), 12 interaction process vari- 
ables were measured by quantitative 
observation under two conditions, 
and by a projective test. In this test, 
the stimuli were pictures of groups, 
for which the S generated a series of 
verbal interchanges; these were then 
scored in Interaction Process Analy- 
sis categories. For illustrative pur- 
poses, Table 8 presents the five traits 
which had the highest mean com- 
munalities in the over-all factor anal- 
ysis. Between the two highly sim- 
ilar observational methods, valida- 
tion is excellent: trait variance runs 
higher than method variance; valid- 
ity diagonals are in general higher 
than heterotrait values of both the 
heteromethod and monomethods 
blocks, most unexceptionably so for 
Gives Opinion and Gives Orientation. 
The pattern of correlation among the 
traits is also in general confirmed. 

Of greater interest because of the 
greater independence of methods are 
the blocks involving the projective 
test. Here the validity picture is 
much poorer. Gives Orientation 
comes off best, its projective test 
validity values of .35 and .33 being 
bested by only three monomethod 
values and by no heterotrait-hetero- 
method values within the projective 
blocks. All of the other validities are 
exceeded by some heterotrait-hetero- 
method value. 

The projective test specialist may 
object to the implicit expectations of 
a one-to-one correspondence between 
Projected action and Overt action. 
Such expectations should not be at- 
tributed to Borgatta, and are not 
necessary to the method here pro- 
posed. For the simple symmetrical 
model of- this Paper, it has been as- 
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TABLE 9 


Mayo's INTERCORRELATIONS BETWEEN OBJECTIVE AND RATING 
MEASURES OF INTELLIGENCE AND EFFORT 


(N=166) 
Peer Ratings Objective 
es ese 
A Bı As Bz 

Peer Rating 

Intelligence Ay (.85) 

Effort Bi 66 (.84) 
Objective Measures 

Intelligence Ag -46 .29 (5) 

Effort B: -46 -40 -10 G) 


sumed that the measures are labeled 
in correspondence with the correla- 
tions expected, i.e., in correspondence 
with the traits that the tests are 
Tote that in 
Table 8, Gives Opinion is the best 
projective test predictor of both free 
behavior and role playing Shows Dis- 
agreement. Were a Proper theoretical 


these values 
might be regarded as validities, 


(.84 and 85). The objective meas- 
ures share no appreciable apparatus 
overlap because they were independ- 
ent operations. In spite of Mayo’s 
argument that the ratings have some 
valid trait variance, the .46 hetero- 
trait-heteromethod value seriously de- 
Preciates the otherwise impressive .46 
and .40 validity values, ' 
Cronbach (1949, p. 277) and Ver- 
non (1957, 1958) ha 
the multitrait- i 


ernon estimates that 61% of the 
Systematic variance is due to 


eral factor, that 214% is due to the 


and that 


Pictorial Items 


= ire 
Ay Bı A: Be 

Verbal Items 

Mechanical Facts A (.89) 

Electrical Facts Bı -63 (.71) 
Pictorial Items 

Mechanical Facts Ay 61 45 (.82) 

Electrical Facts Bz 
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tors specific to electrical or to mechan- 
ical contents. Note that for the pur- 
poses of estimating validity, the in- 
terpretation of the general factor, 
which he estimates from the .49 and 
45 heterotrait-heteromethod values, 
is equivocal. It could represent de- 
sired competence variance, represent- 
ing components common to both elec- 
trical and mechanical skills—perhaps 
resulting from general industrial shop 
experience, common ability compo- 
nents, overlapping learning situations, 
and the like. On the other hand, this 
general factor could represent over- 
lapping method factors, and be due to 
the presence in both tests of multiple 
choice item format, IBM answer 
sheets, or the heterogeneity of the Ss 
in conscientiousness, test-taking mo- 
tivation, and test-taking sophistica- 
tion. Until methods that are still 
more different and traits that are 
still more independent are introduced 
into the validation matrix, this gen- 
eral factor remains uninterpretable. 
From this standpoint it can be seen 
that 213% is a very minimal estimate 
of the total test-form variance in the 
tests, as it represents only test-form 
components specific to the verbal or 
the pictorial items, i.e., test-form 
components which the two forms do 
es Similarly, and more hope- 
oe ute 114% content variance is a 
Bere Aspe estimate of the total 
EE peor of the tests, repre- 
oe he only the true trait variance 
ch electrical and mechanical 
knowledge do not share. 
penel (1952) has provided data 
paei ekod Maing Inventory of 
hicks ban at and related ratings 
Meer oes + rearranged into the 
hoa able 11. (Variable R has 
ne nverted to reduce the number 
negative correlations.) Two of the 
methods, Self Ratings and Inventory 
Scores, can be seen as sharing method 


variance, and thus as having an in- 
flated validity diagonal. The more 
independent heteromethod blocks in- 
volving Peer Ratings show some evi- 
dence of discriminant and convergent 
validity, with validity diagonals av- 
eraging .33 (Inventory XPeer Rat- 
ings) and .39 (Self Ratings Peer 
Ratings) against heterotrait-hetero- 
method control values averaging .14 
and .16. While not intrinsically im- 
pressive, this picture is nonetheless 
better than most of the validity ma- 
trices here assembled. Note that the 
Self Ratings show slightly higher 
validity diagonal elevations than do 
the Inventory scores, in spite of the 
much greater length and undoubtedly 
higher reliability of the latter. In ad- 
dition, a method factor seems almost 
totally lacking for the Self Ratings, 
while strongly present for the Inven- 
tory, so that the Self Ratings come 
off much the best if true trait vari- 
ance is expressed as a proportion of 
total reliable variance (as Vernon 
[1958] suggests). The method factor 
in the STDCR Inventory is undoubt- 
edly enhanced by scoring the same 
item in several scales, thus contribut- 
ing correlated error variance, which 
could be reduced without loss of reli- 
ability by the simple expedient of 
adding more equivalent items and 
scoring each item in only one scale. 
It should be noted that Carroll makes 
explicit use of the comparison of the 
validity diagonal with the hetero- 
trait-heteromethod values as a valid- 
ity indicator. i 


RATINGS IN THE ASSESSMENT STUDY 
OF CLINICAL PSYCHOLOGISTS 


The illustrations of multitrait- 
multimethod matrices presented so 
far give a rather sorry picture of the 
validity of the measures of individual 
differences involved. The typical 
case shows an excessive amount of 
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method variance, which usually ex- 
ceeds the amount of trait variance. 
This picture is certainly not as a re- 
sult of a deliberate effort to select 
shockingly bad examples: these are 
ones we have encountered without at- 
tempting an exhaustive coverage of 
the literature. The several unpub- 
lished studies of which we are aware 
show the same picture. If they seem 
more disappointing than the general 
run of validity data reported in the 
journals, this impression may very 
well be because the portrait of valid- 
ity provided by isolated values 
plucked from the validity diagonal is 
deceptive, and uninterpretable in 
isolation from the total matrix. Yet 
it is clear that few of the classic ex- 
amples of successful measurement of 
individual differences are involved, 
and thatin many of the instances, the 
quality of the data might have been 
such as to magnify apparatus factors, 
etc. A more nearly ideal set of per- 
sonality data upon which to illus- 
trate the method was therefore 
sought in the multiple application of 
a set of rating scales in the assess- 
ment study of clinical psychologists 
(Kelly & Fiske, 1951). 

In that study, “Rating Scale A” 
contained 22 traits referring to ‘‘be- 
havior which can be directly observed 
eR the surface.” In using this scale 
the raters were instructed to ‘‘disre- 
ain any inferences about underlying 
Se htc, or causes” (p. 207).. The 
4 year clinical psychology stu- 

ents, rated themselves and also their 
pee teammates with whom they 
aad participated in the various as- 
sessment procedures and with whom 
they had lived for six days. The 
median of the three teammates’ rat- 
Ings was used for the Teammate 
score. lhe Ss were also rated on these 
22 traits by the assessment staff. Our 
analysis uses the Final Pooled rat- 


ings, which were agreed upon by 
three staff members after discussion 
and review of the enormous amount 
of data and the many other ratings on 
each S. Unfortunately for our pur- 
poses, the staff members saw the rat- 
ings by Self and Teammates before 
making theirs, although presumably 
they were little influenced by these 
data because they had so much other 
evidence available to them. (See Kel- 
ly & Fiske, 1951, especially p. 64.) 
The Self and Teammate ratings rep- 
resent entirely separate ‘‘methods” 
and can be given the major emphasis 
in evaluating the data to be pre- 
sented. 

In a previous analysis of these data 
(Fiske, 1949), each of the three heter- 
otrait-monomethod triangles was 
computed and factored. To provide 
a multitrait-multimethod matrix, the 
1452 heteromethod correlations have 
been computed especially for this re- 
port.? The full 66X66 matrix with 
‘ts 2145 coefficients is obviously too 
large for presentation here, but will 
be used in analyses that follow. To 
provide an illustrative sample, Table 
12 presents the interrelationships 
among five variables, selecting the 
one best representing each of the five 
recurrent factors discovered in Fiske’s 
(1949) previous analysis of the mono- 
method matrices. (These were chosen 
without regard to their validity as 
indicated in the heteromethod blocks. 
Assertive—No. 3 reflected—was se- 
lected to represent Recurrent Factor 
5 because Talkative had also a high 


2 We are indebted to E. Lowell Kelly for 
furnishing the V.A. assessment date to us, and 
to Hugh Lane for producing the matrix of 
intercorrelations. 

In the original report the correlations were 
based upon 128 men. The present analyses 
were based on only 124 of these cases because 
of clerical errors. This reduction in N leads 
to some very minor discrepancies between 
these values and those previously reported’ 
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loading on the first recurrent factor.) 
l The picture presented in Table 12 
is, we believe, typical of the best 
validity in personality trait ratings 
that psychology has to offer at the 
present time. Itis comforting to note 
that the picture is better than most 
of those previously examined. Note 
that the validities for Assertive ex- 
ceed heterotrait values of both the 
monomethod and heteromethod tri- 
angles. Cheerful, Broad Interests, 
and Serious have validities exceeding 
the heterotrait-heteromethod values 
with two exceptions. Only for Un- 
shakable Poise does the evidence of 
validity seem trivial. The elevation 
of the reliabilities above the hetero- 
trait-monomethod triangles is further 
evidence for discriminant validity. 

A comparison of Table 12 with the 
full matrix shows that the procedure 
of having but one variable to repre- 
sent each factor has enhanced the ap- 
pearance of validity, although not 
necessarily in a misleading fashion. 
Where several variables are all highly 
loaded on the same factor, their 
“true” level of intercorrelation is 
high. Under these conditions, sam- 
pling errors can depress validity diag- 
onal values and enhance others to 
produce occasional exceptions to the 
validity picture, both in the hetero- 
ie enomethod matrix and in the 
fe eS heteroirait triangles. 
Batali Stance, with an N of 124, the 
me ae aes is appreciable, and 
nila us be expected to exaggerate 

legree of invalidity. 

Within the monomethod sections, 
errors of measurement will be cor- 
related, raising the general level of 
values found, while within the heter- 
omethods block, measurement errors 
T independent, and tend to lower 
the values both along the validity 


diagonal and in the heterotrait tri- — 


angles. These effects, which may also 


| 


be stated in terms of method factors 
or shared confounded irrelevancies, 
operate strongly in these data, as 
probably in all data involving rat- 
ings. In such cases, where several 
variables represent each factor, none 
of the variables consistently meets 
the criterion that validity values ex- 
ceed the corresponding values in the 
monomethod triangles, when the full 
matrix is examined. 

To summarize the validation pic- 
ture with respect to comparisons of 
validity values with other hetero- 
method values in each block, Table 
13 has been prepared. For each trait 
and for each of the three hetero- 
method blocks, it presents the value 
of the validity diagonal, the highest 
heterotrait value involving that trait, 
and the number out of the 42 such 
heterotrait values which exceed the 
validity diagonal in magnitude. (The 
number 42 comes from the grouping 
of the 21 other column values and the 
21 other row values for the column 
and row intersecting at the given 
diagonal value.) 

On the requirement that the valid- 
ity diagonal exceed all others in its 
heteromethod block, none of the 
traits has a completely perfect record, 
although some come close. Assertive 
has only one trivial exception in the 
Teammate-Self block. Talkative has 
almost as good a record, as does 
Imaginative. Serious has but two in- 
consequential exceptions and Interest 
in Women three. These traits stand 
out as highly valid in both self- 
description and reputation. Note 
that the actual validity coefficients of 
these four traits range from but .22 to 
82, or, if we concentrate on the 
Teammate-Self block as most cer- 
tainly representing independent 
methods, from but .31 to 46, While 
these are the best traits, it seems that 
most of the traits have far above 
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FABLE 13 


7 i 
IES OF TRAITS IN THE ASSESSMENT STUDY oF CLINICAL I SYCHOLOGISTS, 
‘ree AS JUDGED BY THE HETEROMETHOD COMPARISONS 


Staff-Teammate 


Staff-Self Teammate-Self 


1. Obstructiveness* 30.34 2 t6 -27 9 19 Ee k 
2. Unpredictable SE 20 0 I A 3 P 19 
aN Rasertive* Ti §.65 0 .48 as 0 .46 i 
4. Cheerful* oo 060 2 Ah AD a at ae 5 
5. Serious* -e e e re 0 
6. Cool, Aloof .49 .48 0 .20 .46 10 .02 .34 oe 
7. Unshakable Poise 20 <40 16 Kak. Bt 4 ae «ly 1 

8. Broad Interests* 47 46 0 20 aay 6 <9 4.352 0 
9. Trustful -26 .34 5 .08 AP as" 19 re | Ni By 9 
10. Self-centered 30: <4 2 a By gard? ir, 6 —.07 .19 36 
11. Talkative* 82 ©. 65 0 47 45 0 43 48 1 
12. Adventurous 45 .60 6 .28 .30 2 .16 -36 14 
13. Socially Awkward “£9: 237 0 U5 21 28 04 .16 30 
14. Adaptable* 44 (40 0 T e «Zs 10 ‘a? 329 8 
15. Self-sufficient* ‘2 - «33 1 AS 4.18 5 .18 15 0 
16. Worrying, Anxious* A -aS 0 D K ee 5 gia > (sth 1 
17. Conscientious 26 .33 4 AL a32 19 5 mr 5: 2 
18. Imaginative* 44 446 1 2 asi 0 306 239 0 
19. Interest in Women* 42 43 2 95.38 0 Of 40 1 
20. Secretive, Reserved* 40 =.58 5 38 40 2 a as 3 
21. Independent Minded Se 2 08 (25 19 ETI .30 3 
22. Emotional Expression* -62 .63 1 | .46 5 19 ° 34 10 


Note.—Val. 
heterotrait values exceeding th 


Trait names which 
heteromethod 


chance validity. All those h 


aving 10 
or fewer exceptions have a 


=highest heterotrait value; N 


o. Higher =number of 
ethod blocks significantly &reater than the heterotrait- 


Self block, all but five for the most 


degree of independent block, Teammate-Self. 
validity significant at the .001 level The exceptions to significant validity 
as crudely estimated by a one-tailed are not parallel from column to col- 
sign test.* All but one of the variables umn, however, and only 13 of 22 
meet this level for the Staff-Team- i i 


mate block, all but four for the Staff- 


3 If we take the validity value as fixed (ig- 
noring its sampling fluctuations), then we can 
determine whether the number of values 
larger than it in its row and column is less than 
expected on the null hypothesis that half the 
values would be above it. This procedure re- 
quires the assumption that the position (above 
or below the validity value) of any one of 
these comparison values is independent of the 


ity in all three blocks, 
cated by an asterisk in Table 13. 
This highly significant general 
level of validity must not obscure the 
meaningful problem created by the 
occasional exceptions, even for the 
best variables. The excellent traits 
of Assertive and Talkative provide 
a case in point. In terms of Fiske’s 


Original analysis bot : 
position of each of the others, a dubious as- | is aly h ) h have high 
sumption when common methods and trait ee Ings on the recurrent factor 
variance are present. Confident self-expression” (repre- 
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sented by Assertive in Table 12): 
Talkative also had high loadings on 
the recurrent factor of Social Adapta- 
bility (represented by Cheerful in 
Table 12). We would expect, there- 
fore, both high correlation between 
them and significant discrimination 
as well. And even at the common 
sense level, most psychologists would 
expect fellow psychologists to dis- 
criminate validly between assertive- 
ness (nonsubmissiveness) and talka- 
tiveness. Yet in the Teammate-Self 
block, Assertive rated by self cor- 
relates .48 with Talkative by team- 
mates, higher than either of their 
validities in this block, .43 and .46. 
In terms of the average values of 
the validities and the frequency of 
exceptions, there is a distinct trend 
for the Staff-Teammate block to 
show the greatest agreement. This 
can be attributed to several factors. 
Both represent ratings from the ex- 
ternal point of view. Both are aver- 
aged over three judges, minimizing 
individual biases and undoubtedly in- 
creasing reliabilities. Moreover, the 
Teammate ratings were available to 
the Staff in making their ratings. An- 
other effect contributing to the less 
adequate convergence and discrim- 
ination of Self ratings was a response 
set toward the favorable pole which 
greatly reduced the range of these 


‘measures (Fiske, 1949, p. 342). In- 


Avene Aan details of the instances 
cana ty summarized in Table 13 
say at in most instances the ef- 
lect 1s attributable to the high spec- 
mead and low communality for the 
a pa: trait. In these instances, 
fi oe eae row intersecting at 
zorad va jee diagonal are asym- 
pon as far as general level of cor- 
ee ie Ge ee a fact covered 
A beled condensation provided 


The personality psychologist is 


initially predisposed to reinterpret 
self-ratings, to treat them as symp- 
toms rather than to interpret them 
literally. Thus, we were alert to in- 
stances in which the self ratings were 
not literally interpretable, yet none- 
theless had a diagnostic significance 
when properly “translated.” By and 
large, the instances of invalidity of 
self-descriptions found in this assess- 
ment study are not of this type, but 
rather are to be explained in terms of 
an absence of communality for one 
of the variables involved. In general, 
where these self descriptions are in- 
terpretable at all, they are as literally 
interpretable as are teammate de- 
scriptions. Such a finding may, of 
course, reflect a substantial degree of 
insight on the part of these Ss. 

> The general success in discriminant 
validation coupled with the parallel 
factor patterns found in Fiske’s 
earlier analysis of the three intra- 
method matrices seemed to justify an 
inspection of the factor pattern valid- 
ity in this instance. One possible pro- 
cedure would be to do a single analy- 
sis of the whole 66X66 matrix. 
Other approaches focused upon sep- 
arate factoring of heteromethods 
blocks, matrix by matrix, could also 
be suggested. Not only would such 
methods be extremely tedious, but in 
addition they would leave undeter- 
mined the precise comparison of 
factor-pattern similarity. Correlat- 
ing factor loadings over the popula- 
tion of variables was employed for 
this purpose by Fiske (1949) but 
while this provided for the identifica- 
tion of recurrent factors, NO single 
over-all index of factor pattern sim- 
‘larity was generated. Since our im- 
mediate interest was in confirming a 
pattern of interrelationships, rather 
than in describing it, an efficient 
short cut was available: namely to 
test the similarity of the sets of heter- 
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otrait values by correlation coeffi- 
cients in which each entry repre- 
sented the size values of the given 
heterotrait coefficients in two differ- 
ent matrices. For the full matrix, 
such correlations would be based 
upon the N of the 2221/2 or 231 
specific heterotrait combinations. 
Correlations were computed between 
the Teammate and Self monometh- 
ods matrices, selected as maximally 
independent. (The values to follow 
were computed from the original cor- 
relation matrix and are somewhat 
higher than that which would be ob- 
tained from a reflected matrix.) The 
similarity between the two mono- 
methods matrices was -84, corrob- 
orating the factor-pattern similarity 
between these matrices described 
more fully by Fiske in his parallel 
factor analyses of them. To carry 
this mode of analysis into the hetero- 
method block, this block was treated 
as though divided into two by the 
validity diagonal, the above 
values and the below di 
senting the maximally 
validation of the hetero 
tion pattern. 
-63, a value wh 
an impressive 


diagonal 
agonal repre- 
independent 
trait correla- 
These two correlated 
ich, while lower, shows 


e degree of confirmation. 
There remains the question as to 


whether this pattern upon which the 
two heteromethod-heterotrait tri- 
angles agree is the same one found in 
common between the two mono- 
method triangles. The intra-Team- 
mate matrix correlated with the two 
heteromethod triangles .71 and .71, 
The intra-Self matrix correlated with 
the two .57 and .63. In general, then, 
there is evidence for validity of the 
intertrait relationship pattern. 


Discussion 


Relation to construct validity. While 
the validational criteria presented are 
explicit or implicit in the discussions 
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of construct validity (Cronbach & 
Meehl, 1955; APA, 1954), this pa- 
per is primarily concerned with the 
adequacy of tests as measures of a 
construct rather than with the ade- 
quacy of a construct as determined 
by the confirmation of theoretically 
predicted associations with measures 
of other constructs. We believe that 
before one can test the relationships 
between a specific trait and other 
traits, one must have some confidence 
in one’s measures of that trait. Such 
confidence can be supported by evi- 
dence of convergent and discriminant 
validation. Stated in different words, 
any conceptual formulation of trait 
will usually include implicitly the 
Proposition that this trait is a re- 
sponse tendency which can be ob- 
served under more than one experi- 
mental condition and that this trait 
can be meaningfully differentiated 
from other traits. The testing of 
these two propositions must be prior 
to the testing of other Propositions to 
Prevent the acceptance of erroneous 
conclusions, For example, a con- 
ceptual framework might postulate a 
large correlation between Traits A 
and B and no correlation between 
Traits A and C. If the experimenter 
then measures A and B by one 
method (e.g., questionnaire) and C 
by another method (such as the meas- 
urement of overt behavior in a situa- 


tion test), his findings may be con- 
sistent with his hy 


The requirements of this Paper are 
intended to be as appropriate to the 
relatively atheoretical efforts typical 
of the tests and measurements field 
as to more theoretical efforts. This 
emphasis on validational criteria ap- 
Propriate to our Present atheoretical 
level of test construction is not at all 


= 
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incompatible with a recognition of 
the desirability of increasing the ex- 
tent to which all aspects of a test and 
the testing situation are determined 
by explicit theoretical considerations, 
as Jessor and Hammond have advo- 
cated (Jessor & Hammond, 1957). 

Relation to operationalism. Under- 
wood (1957, p. 54) in his effective 
presentation of the operationalist 
point of view shows a realistic aware- 
ness of the amorphous type of theory 
with which most psychologists work. 
He contrasts a psychologist’s ‘‘lit- 
erary” conception with the latter’s 
operational definition as represented 
by his test or other measuring instru- 
ment. He recognizes the importance 
of the literary definition in communi- 
cating and generating science. He 
cautions that the operational defini- 
tion “may not at all measure the 
process he wishes to measure; it may 
measure something quite different” 
(1957, p. 55). He does not, however, 
indicate how one would know when 
one was thus mistaken. 

The requirements of the present 
paper may be seen as an extension of 
the kind of operationalism Under- 
wood has expressed. The test con- 
structor is asked to generate from his 
literary conception or private con- 
struct not one operational embodi- 
ment, but two or more, each as dif- 
ferent in research vehicle as possible. 
Furthermore, he is asked to make ex- 
plicit the distinction between his new 
variable and other variables, distinc- 
tions which are almost certainly im- 
plied in his literary definition. In his 
very first validational efforts, before 
he ever rushes into print, he is asked 
to apply the several methods and sev- 
eral traits jointly. His literary defini- 
men his conception, is now best rep- 
meg ed in what his independent 
ais ate of the trait hold distinc- 

y in common. The multitrait- 


multimethod matrix is, we believe, 
an important practical first step in 
avoiding “the danger ..- that the 
investigator will fall into the trap of 
thinking that because he went from 
an artistic or literary conception 
` to the construction of items fora 
scale to measure it, he has validated 
his artistic conception” (Underwood, 
1957, p. 55). In contrast with the 
single operationalism now dominant 
in psychology, we are advocating a 
multiple operationalism, a convergent 
operationalism (Garner, 1954; Garner, 
Hake, & Eriksen, 1956), a methodologi- 
cal triangulation (Campbell: 1953, 
1956), an operational delineation 
(Campbell, 1954), a convergent valida- 
tion. 

Underwood's presentation and that 
of this paper as a whole imply moving 
from concept to operation, a sequence 
that is frequent in science, and per- 
haps typical. The same point can be 
made, however, in inspecting a tran- 
sition from operation to construct. 
For any body of data taken from a 
single operation, there is a subinfinity 
of interpretations possible; a sub- 
infinity of concepts, Or combinations 
of concepts, that it could represent. 
Any single operation, as representa- 
tive of concepts, is equivocal. In an 
analogous fashion, when we view the 
‘Ames distorted room from a fixed 
point and through a single eye; the 
data of the retinal pattern are equiv- 
ocal, in that a subinfinity of hexa- 
hedrons could generate the same pat- 
tern. The addition of a second view- 
point, as through binocular parallax, 
greatly reduces this equivocality, 
greatly limits the constructs that 
could jointly account for both sets of 
data. In Garner's (1954) study, the 
fractionation measures from a single 
method were equivocal—they coul 
have been a function of the stimulus 
distance being fractionated, or they 
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could have been a function of the 
comparison stimuli used in the judg- 
ment process. A multiple, convergent 
operationalism reduced this equivo- 
cality, showing the latter conceptual- 
ization to be the appropriate one, and 
revealing a preponderance of meth- 
ods variance. Similarly for learning 
studies: in identifying constructs 
with the response data from animals 
in a specific operational setup there is 
equivocality which can operationally 
be reduced by introducing transposi- 
tion tests, different operations so de- 
signed as to put to comparison the 
rival conceptualizations (Campbell, 
1954). 

Garner’s convergent operational- 
ism and our insistence on more than 
one method for measuring each con- 
cept depart from Bridgman’s early 
position that “if we have more than 
one set of operations, we haye more 
than one concept, and strictly there 
should be a Separate name to cor- 
respond to each different set of op- 
erations” (Bridgman, 1927, p. 10). 
At the current stage of 


measurement 


t become better 
developed, it may well be appropri- 


ate to differentiate conceptually be- 
tween Trait-Method Unit Ai and 
Trait-Method Unit Ag, in which 
Trait A is measured by different 
methods. More likely, what we have 
called method variance will be speci- 
fied theoretically in terms of a set of 
constructs. (This has in effect been 
illustrated in the discussion above in 
which it was noted that the response 
set variance might be viewed as 


trait variance, and in the rearrange- 
ment of the social intelligence ma- 
trices of Tables 4 and 5.) It will then 
be recognized that measurement pro- 
cedures usually involve several the- 
oretical constructs in joint applica- 
tion. Using obtained measurements 
to estimate values for a single con- 
struct under this condition still re- 
quires comparison of complex meas- 
ures varying in their trait composi- 
tion, in something like a multitrait- 
multimethod matrix. Mill's joint 
method of similarities and differences 
still epitomizes much about the ef- 
fective experimental clarification of 
concepts, 


The evaluation ofa multitrait-multi- 
method matrix. The evalu 


ities of the two measures involved: 
&g., a low reliability for Test Ag 
might exaggerate the apparent 
method variance in Test Aj, Again, 
the whole approach assumes ade- 
quate sampling of individuals: the 
of the sample with re- 


Press the reliability coe 
Intercorrelations ; 
traits. 


t traits is the 
ard to meaningful 


: treatments 
t-multimethod Matrices 


= 


VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 


of a value in the validity diagonal 
above the comparison values in its 
row and column. Correlations be- 
tween the columns for variables 
measuring the same trait, variance 
analyses, and factor analyses have 
been proposed to us. However, the 
development of such statistical meth- 
ods is beyond the scope of this paper. 
We believe that such summary sta- 
tistics are neither necessary nor ap- 
propriate at this time. Psychologists 
today should be concerned not with 
evaluating tests as if the tests were 
fixed and definitive, but rather with 
developing better tests. We believe 
that a careful examination of a multi- 
trait-multimethod matrix will indi- 
cate to the experimenter what his 
next steps should be: it will indicate 
which methods should be discarded 
or replaced, which concepts need 
sharper delineation, and which con- 
cepts are poorly measured because of 
excessive or confounding method var- 
iance. Validity judgments based on 
such a matrix must take into account 
the stage of development of the con- 
structs, the postulated relationships 
among them, the level of technical 
refinement of the methods, the rela- 
tive independence of the methods, 
and any pertinent characteristics of 
the sample of Ss. We are proposing 
that the validational process be 
viewed as an aspect of an ongoing 
Program for improving measuring 
Procedures and that the “validity 
Coefficients” obtained at any one 
stage in the process be interpreted in 
terms of gains over preceding stages 
and as indicators of where further ef- 
fort is needed. 
Pe design of a multitrait-multi- 
EF matrix. The several methods 
Shs lan included in a validational 
Ee 7 should be selected with care. 
an veral methods used to measure 
trait should be appropriate to 
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the trait as conceptualized. Although 
this view will reduce the range of 
suitable methods, it will rarely re- 
strict the measurement to one opera- 
tional procedure. 

Wherever possible, the several 
methods in one matrix should be com- 
pletely independent of each other: 
there should be no prior reason for 
believing that they share method 
variance. This requirement is neces- 
sary to permit the values in the heter- 
omethod-heterotrait triangles to ap- 
proach zero. If the nature of the 
traits rules out such independence 
of methods, efforts should be made to 
obtain as much diversity as possible 
in terms of data-sources and classifi- 
cation processes. Thus, the classes of 
stimuli or the background situations, 
the experimental contexts, should be 
different. Again, the persons provid- 
ing the observations should have dif- 
ferent roles or the procedures for 
scoring should be varied. 

Plans for a validational matrix 
should take into account the differ- 
ence between the interpretations re- 
garding convergence and discrimina- 
tion. It is sufficient to demonstrate 
convergence between two clearly dis- 
tinct methods which show little over- 
lap in the heterotrait-heteromethod 
triangles. While agreement between 
several methods is desirable, conver- 
gence between two isa satisfactory 
minimal requirement. Discrimina- 
tive validation is not so easily 
achieved. Just as it is impossible to 
prove the null hypothesis, or that 
some object does not exist, so one 
can never establish that a trait, as 
measured, is differentiated from all 
other traits. One can only show that 
this measure of Trait A has little 
overlap with those measures of B and 
C, and no dependable generalization 
beyond B and C can be made. For 
example, social poise could probably 
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be readily discriminated from aes- 
thetic interests, but it should also be 
differentiated from leadership. 

Insofar as the traits are related and 

are expected to correlate with each 
other, the monomethod correlations 
will be substantial and heteromethod 
correlations between traits will also 
be positive. For ease of interpreta- 
tion, it may be best to include in the 
matrix at least two traits, and prefer- 
ably two sets of traits, which are 
postulated to be independent of each 
other. 

In closing, a word of caution is 
needed. Many multitrait-multi- 
method matrices will show no con- 
vergent validation: no relationship 
may be found between two methods 

In this common 
situation, the experimenter should 
examine the evidence in favor of sey- 
eral alternative Propositions: (a) 
Neither method is adequate for meas- 
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the nontrait attributes of each test. 
The failure to demonstrate conver- 
gence may lead to conceptual devel- 
opments rather than to the abandon- 
ment of a test. 


SUMMARY 


This paper advocates a valida- 
tional process utilizing a matrix of 
intercorrelations among tests repre- 
senting at least two traits, each meas- 
ured by at least two methods. Meas- 
ures of the same trait should correlate 
higher with each other than they do 
with measures of different traits in- 
volving separate methods. Ideally, 
these validity values should also be 
higher than the correlations among 
different traits measured by the same 
method. 

Illustrations from the literature 
show that these desirable conditions, 
as a set, are rarely met. Method or 
apparatus factors make very large 


contributions to Psychological meas- 
urements. 


The notions of convergence be- 
tween independent measures of the 
same trait and discrimination be- 
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Investigations extending over 
many years on the conditioned reflex 
activity of the higher animals and 
man have shown that “innumerable 
variations in both the external and 
internal media of the organism, each 
of which is reflected in certain states 
of the cortical nerve cells, may them- 
selves become individual conditioned 
stimuli” (Pavloy, 1947, p. 51). 

It is quite natural that time, as one 
of the fundamental attributes of the 
existence of matter, and being closely 
linked with all variations in the ex- 
ternal and internal media of the or- 
ganism, should be very frequently 
found to be one of the components of 


t ulus, emerging 
in the role of an exciter of conditioned 


3 m of the or- 
ganism. In consequence of the link- 


of h time, the 
conditions are created for the reflec- 


ve reality in 
or the forma- 
xes to certain 
segments of time, as was demon- 
strated experimentally by the investi- 


1 This article was first published in Uspekhi 
Souremennoit biologii, 1955, 40, 31-51. It was 
translated by the Pergamon Institute for the 
Russian Scientific Translation Program of the 
National Institutes of Health, U. S. Public 
Health Service. Reprints of this article may 
be obtained by writing directly to the Editor 
of the Psychological Bulletin, 
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gations of I. P. Pavlov and his co- 
workers. 

The formation of conditioned re- 
flexes to time plays an important part 
in the systemic activity of the cere- 
bral cortex, in the development of 
definite periodicity in physiological 
functions, and in the establishment of 
rhythmical pattern reactions in the 
working activity of man. By virtue 
thereof the question of the cortical 
mechanisms involved in the process 
of formation of conditioned reflexes 
to time acquires considerable theoret- 
ical and practical importance. The 
present work is concerned with the 
analysis and generalization of find- 
ings given in the literature and the re- 
sults of our own investigations. 


I 
The fact that conditioned reflexes 
are formed 


lished in Pavlov’s laboratory, in the 
experiments of G, P, Zelenyi (1907), 
and later, those of K. N, Krzhish- 


; » working on the for- 
mation of conditio 
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In the work of K. N. Krzhishkov- 
skii on an investigation of condi- 
tioned inhibition it was established 
that the application of inhibition con- 
Stantly at the nineteenth to twentieth 
minute led to the following events: 
the conditioned stimulus always 
proved inactive at the nineteenth to 
twentieth minute, but manifested its 
effect at the thirty-third to thirty- 
fourth minute. Krzhishkovskii ob- 
served this phenomenon over a period 
of six months. Wishing to determine 
whether time was the factor deter- 
mining inhibition under these circum- 
stances, he set up experiments for the 
establishment of a conditioned sali- 
vary reflex when a combination of the 
unconditioned stimulation (solution 
of hydrochloric acid) with the con- 
ditioned stimulus was administered 
constantly every 10-13 minutes. It 
was found that “if the animal’s oral 
cavity was stimulated at equal inter- 
vals of time, then,’”-—in the author's 
words—‘with the passage of time 
such a state developed that at the ap- 
propriate moment, even in the ab- 
sence of any obvious stimulations, a 
flow of saliva began, and frequently 
also there was a characteristic motor 
reaction.” 

_ The importance of time as a factor 
in the establishment of conditioned 
reflexes has also been noted by other 
investigators in Pavlov’s laboratory 
i works referable to the same period. 
hus, observations by I. P. Pimenov 
(300); I. V. Zavadskii (1908), F. S. 
X moen (1909), and V. M. Dobro- 
A skii (1911) showed that, in the 
E of trace and delayed reflexes, re- 
a on to the conditioned stimulation 
Paes only after a certain time 
thane e Coren of its action, 
of tae after the lapse of the segment 
em a which habitually elapsed be- 
Tr ee orcement was administered. 
er words, the conditioned reflex 


was formed not simply to a particular 
stimulus, but to stimulus plus a cer- 
tain segment of time. 

These various investigations justi- 
fied consideration of the question of 
time as a stimulus of signal signifi- 
cance, and of the power of the cortical 
cells to reckon time. And even at 
that time Pavlov, on the basis of 
existing factual material, advanced a 
hypothesis on the mechanism of time 
reckoning in the central neryous 
system. 

“Tt may be thought,” he wrote, 
“that intensity analysis, in part at 
least, is a basic element in the meas- 
urement of time by animals. One can 
speculate whether any external agent 
of uniform, constant strength acts on 
the particular analyser of the animal, 
and whether there is gradual fading 
in the nerve cells of the residuum, 
the trace of actual stimulation which 
has ceased; each intensity of the stim- 
ulated state of the cell, at each sepa- 
rate moment of time, is an individual 
element, differing from both all pre- 
ceding and all following stages of in- 
tensity. Time is perhaps measured by 
these elements as units, and every 
moment of time recorded in the 
nervous system” (1947, pp. 115-116). 

Investigations dealing specially 
with the role of time in the formation 
of conditioned reflexes were first car- 
ried out by Iu. P. Feokritova (1912) 
at Pavlov’s suggestion. “We pro- 
posed to determine,” she wrote, 
“whether time... could be linked, 
like all phenomena in the external en- 
vironment, with the activity of the 
salivary gland, and whether it could 
serve as a specific salivary stimulus. 
If this proved to be the case, we 
wished to determine how rapidly the 
link between a definite time and the 
work of the salivary gland could be 
established, and thereafter to study 
to the fullest possible extent the 
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characteristic features of the new re- 
flex to time” (1912, p. 17). 
With these ends in view, Iu. P. 
Feokritova established conditioned 
salivary reflexes in dogs, repeating 
the combination of conditioned stim- 
ulus (sound of the metronome) and 
unconditioned (feeding of meat- 
biscuit powder or the introduction of 
hydrochloric acid into the mouth) at 
predetermined intervals of time (in 
the case of one dog, every 30 minutes; 
for another every 15 minutes; and in 
the case of a third, every 10 minutes). 
In a number of the experiments only 
one reinforcement was administered 
—at the same intervals of time. 

A stable reflex to time was formed 
after 200-230 repetitions (the number 
varied for the different animals), and 
was evidenced by the fact that after 
a certain interval of time, and just 
before the time for the next reinforce- 
ment, saliva secretion began. The 
author also noted that, before the 
formation of the stable conditioned 
reflex to time, there was irregular sal- 
Ivation in the intervals betw. 
forcements, Later, 
to be limited in ti 
half of the intery 


Salivation in the 
reinforcements was, however, not in- 
frequent (intersigna! 
after the formation 
tioned reflex to time, 

According to Feokritova’s observa- 
tions, the conditioned reflex to “Pure 
time” was formed more rapidly than 
when time was combined with an 
auditory stimulus. She explains this 
on the grounds that time, as an inde- 
pendent stimulus, is very weak, and 
that the auditory stimulus overshad- 
ows its action. 

Examining the effect of the metro- 
nome in combination with different 
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intervals of time (except that to 
which the conditioned reflex was 
formed), Feokritova found that the 
effect of the metronome, when not 
associated with a definite interval of 
time, was negligible, Thus, in the dog 
in which a conditioned reflex to a 30- 
minute interval was established the 
operation of the metronome at the 
fourteenth minute induced secretion 
of only one drop of saliva, whereas at 
the thirtieth minute the action of the 
Same metronome was accompanied 
by the secretion of 6-11 drops. The 
author concludes from this that “in 
our summated reflex, time is the more 
active excitant and consequently also 
the main component, and not the 
metronome” (1912, p. 57). 

By her specially designed experi- 
ments Feokritoya established that 
the differentiation of time can be 
brought to such a degree of accuracy 
that, for example, when a conditioned 
reflex has been established with a 30- 
minute interval, salivation is timed 
to occur exactly at the thirtieth min- 


ute, and is completely absent a min- 
ute earlier, 


In her ne 


i xt series of experiments 
Feokritova 


tested the action of vari- 
timuli (whistle, ven- 
one) on the course of 
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uli became attenuated, or even disap- 
peared. When powerful extraneous 
stimuli were allowed to act and when 
any sudden change was introduced 
into the course or setting of the exper- 
iments, time differentiation was com- 
pletely destroyed. 

In this connection it is fitting to re- 
call the observations of the author 
on the effect of a somnolent state, 
which the animals frequently de- 
veloped in the course of the experi- 
ments, on the course of the condi- 
tioned reflex. Feokritova noted that 
a somnolent state not only failed to 
weaken the conditioned reflex to 
time, but even increased the differ- 
entiating power of the animal. 

Feokritova’s observations showed 
that extinction of the conditioned re- 
flex to time occurred suddenly. Not- 
ing the similarity in this respect to 
the extinction of trace reflexes, the 
author assigns the conditioned reflex 
to time to the group of trace reflexes. 

Feokritova concluded from her in- 
vestigations that time can be an ex- 
citant of conditioned reflex activity 
in the salivary gland, and that the 
formation of a conditioned reflex to 
time was subject to the same laws as 
the formation of conditioned reflexes 
to other stimuli. 

In offering an explanation of the 
mechanism for the reckoning of time 
by the nervous system, Feokritova 
Wrote: “It is known that, after every 
stimulation, there remains in the 
cerebral cortex the trace of a whole 
series of states of stimulation of the 
as cell, gradually diminishing in 
oe At every individual mo- 
Ana in each short interval of time, 
eas of the nerve cell stimula- 
ai sabi l be different from what it is 
“ae other moment. If we take the 
divin y of nerve cell stimulation 
Beye 2 hans short interval of time as 

independent unit of stimulation, 


it can be stated that the exciter of the 
salivary centre in the case of a time 
reflex is that degree of intensity that 
always coincides in time with the act 
of eating” (1912, p. 162-163). 

The work of M. M. Stukova (1914) 
was a direct continuation of Feokri- 
tova’s investigations, and she con- 
firmed the fact that a conditioned re- 
flex could be established to time (a 
20-minute interval in combination 
with mechanical skin stimulation), 
the time reflex being formed very 
rapidly in her experiments (after 65 
repetitions). In dogs which had been 
transferred to her from Feokritova, 
Stukova noted the complete reten- 
tion of the conditioned reflexes and 
time differentiation established ear- 
lier. 

In her own experiments Stukova 
tested the effect of quite a range of 
different factors on the course of the 
conditioned reflex to time. Above all, 
she tested the effect of faradic cur- 
rents of different strengths. Her ob- 
servations demonstrated that the ef- 
fect depended on the current strength 
and the characteristics of the ani- 
mals’ nervous systems. In the dog 
with well-developed inhibition proc- 
esses currents of various strengths 
had no appreciable effect on the con- 
ditioned reflex, and only at the com- 
mencement of current action during 
the first application was some very 
transient disturbance of time differ- 
entiation observed. In excitable dogs 
inhibition of the conditioned reflex to 
time and disturbance of the correct 
reckoning of time were observed as a 
result of the action of faradic current, 
particularly if strong. A powerful 
current, acting momentarily, inhib- 
ited the reflex to time in all animals. 
Thus, like Feokritova, Stukova noted 
the inhibitory effect of change in the 
setting and course of the experiment 
on the conditioned reflex. 
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In her next series of experiments 
Stukova examined the effect of caf- 
feine (0.05 gm.) and of cocaine (0.02— 
0.03 gm.) on conditioned reflexes to 
time. After subcutaneous injection 
of caffeine, all the animals showed 
(after 8-11 minutes) increased ex- 
citability, disturbance in the reckon- 
ing of time, and disinhibition of time 
differentiation. Under the action of 
cocaine, in addition to increased ex- 
citability, the appearance of continu- 
ous salivation in the intervals be- 
tween reinforcements was observed. 
The reckoning of time by the nervous 
system was upset, and differentiation 
was disordered. 

Stukova established in special ex- 
periments that the conditioned reflex 
to time appears (although varying in 
the degree of its expression) in re- 
sponse to any of the components in 
the summated stimulus (e.g., 
sight or smell of food only, wi 
the reinforcement, 


nome alone or of 


the mechanical sking stimulation 


alone). 

“The reckoning of time by the ani- 
mals,” Stukova concluded, “is Possi- 
ble, not only from a compound stimu- 
lation, that is, from the sum of many 


stimuli, as in our case (natural ali- 


y one of 
these, taken separately” (p. 136). 


( Conditioned 
time reflexes was continued by V. S. 


ame animals. 
When he started work with these anj- 
mals, he first of all noted that in all 
the dogs the conditioned reflexes to 
time produced earlier were restored 
at the first attempt and immediately 
attained their former magnitude. 
Deriabin then tested the effect of 
a three-month interruption in the 
work. He demonstrated in this way 
that, with the lapse of three months, 


all the dogs manifested disorder of 
timé reckoning. 

Change in the experimental setting 
(particularly isolation of the animal 
from the experimenter) had the same 
effect on the reflex. In this case the 
time reflex and time differentiations 
disappeared, to reappear only after 
prolonged repetition. 

Deriabin made a detailed investi- 
gation of the effects of sodium bro- 
mide and chloral hydrate on the re- 
flex to time; these substances were in- 
jected into the animal’s rectum, dis- 
solved in distilled water. Sodium 
bromide, 2 gm., had no effect on the 
magnitude of the conditioned reflex 
to time: “The intensification of the 
processes of internal inhibition,” 
wrote the author in this connection, 
“which is usually seen with sodium 
bromide in this quantity, did not lead 
to increase in the Power of the dog's 
central nervous System to discrim- 
Inate time. Time differentiation re- 


mained unchanged” (p. 94). Chloral 
hydrate, 0.5 gm. to 


appreciable effect on t 
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both sides. In both cases the author 
noted temporary disappearance after 
he operation of the conditioned re- 
ex to the corresponding stimulus 
C kololka,” metronome) in combina- 
tion with time. With the lapse of a 
certain period, however, the condi- 
tioned reflex to time was restored 
completely: the conditioned reflex to 
pure time” was quite clearly ex- 
pressed in one dog seven days, and in 
the other two days after the opera- 
tion. 
mae = own work Deriabin then ex- 
oe the question of whether the 
b Fad ae of the dog is capable 
scriminating between the time 
meow etree of sound and its other 
ea For this purpose he set 
pee pecinnenits in which conditioned 
set a were established to various 
S nds (tuning-fork, gurgling sound, 
histle) with rhythmical intervals 
Lento seconds’ sound—two seconds’ 
eb. Differentiation of the 
ee rhythm from rhythms of the 
anes sound, but with longer or 
oa er sounds and intervals, was 
ace achieved. These _experiments 
y ee that rhythm differentiation 
ae 5 type, established to one sound 
a us (whistling), appeared im- 
a aiey when other sound stimuli 
A zer, tuning-fork) were admin- 
istered. 
nithe author therefore concluded 
at “the dog is capable of distin- 
ung the time properties of an 
ETY. stimulus from other prop- 
a = of sound, and is guided by them 
fist z physiological activity,” and 
tae he nervous system of the dog 
ee eee. of correlating similar 
ae operties of several sound stim- 
TE S as something common to 
ious types of sounds” (p. 141). 
en age therefore, that the works of 
V oeit M. M. Stukova, 
- S. Deriabin have provided a 
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simple experimental basis for Pav- 
lov’s hypothesis on time as a true 
excitant of conditioned reflex activ- 
ity in animals. 

“Tt can be stated,” wrote Pavlov 
in connection with Feokritova’s ex- 
periments, “that in this particular 
case time was the conditioned stim- 
ulus” (1947, p. 50). This stimulus 
“Gs in no way less real,” he pointed 
out, “than all the preceding stimuli” 
(1947, p. 49). 

“Physiologically, how are we to 
understand time as a conditioned 
stimulus?” asked Pavlov. In answer- 
ing this question, he started from the 
fact that in ordinary life we note time 
with the aid of certain cyclical phe- 
nomena, the setting and rising of the 
sun, the movements of hands on the 
dials of clocks, etc. And in our own 
bodies there are not a few of these 
cyclical phenomena. In the course of 
a day the brain receives stimula- 
tions, becomes exhausted and then is 
restored again. The digestive canal 
is periodically filled with food, and 
periodically emptied. And in that 
every state of an organ can be re- 
flected in the cerebral hemispheres, 
we have there a basis for distinguish- 
ing one moment of time from another. 
Let us take short intervals of time. 
When stimulation has just been ad- 
ministered it is felt very acutely. 
When we enter a room in which there 
is some odor or other, we at first feel 
it very strongly, but afterwards less 
and less. Under the influence of 
stimulation the state of the nerve cell 


undergoes a series of changes. It is 
exactly the same in the converse 
this 


case. When stimulation ceases, 
is felt very acutely at first, and then 
less and less, until finally we are un- 
This again means that 
there are a number of different 
states of the nerve cell. Cases of re- 
flexes to interruption of the stimulus 


aware of it. 
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and trace reflexes, as well as cases of 
reflexes to time can be explained 
from the same point of view. In the 
experiment referred to (an experi- 
ment of Feokritova) the animal was 
fed periodically, and in association 
therewith a number of organs mani- 
fested a certain activity, that is, they 
underwent certain successive changes. 
All this was registered in the cerebral 
hemispheres, was received by them, 
and the definite moment of these 
changes became a conditioned stim- 
ulus” (1947, p. 50-51). 

After the work of Feokritova, 
Stukova, and Deriabin, salivary con- 
ditioned reflexes to time were ex- 
amined by a number of authors. 
These investigations, as well as giv- 
ing greater definition to the condi- 
tions attending the formation of re- 
flexes to time, demonstrated their 
importance in the systemic activity 
of the cerebral cortex. 


Thus, F. D, Vasilenko (1932), 
working with a Pattern of condi- 
tioned fo 


od stimuli, repeated at iden- 
tical intervals of time, noted increase 


from experiment to experiment in the 
conditioned reflex reaction to one of 
the weak stimuli in the pattern 
(light). The author established that 
this increase in the Secretory effect to 
light was the result of the summation 
of the conditioned reflex to light with 
the reflex to time which was being 
developed. 

F. P. Maiorov (1933) also de- 
scribes the formation of a conditioned 
reflex to time in relation to the action 
of a pattern. The author, having 
established a conditioned salivary re- 
flex to the sound of the metronome 
(60 beats per minute) and a motor 
conditioned reflex to a different 
metronome frequency (120 beats per 
minute), began to alternate the ac- 
tion of these stimuli at equal inter- 
vals of time (six minutes). Compara- 
tively soon (after nine repetitions) 


the author was able to establish a 
conditioned reflex to time, in which 
at first, in complete independence of 
the nature of the conditioned stimu- 
lation, both salivation and motor re- 
action appeared towards the end of 
the six-minute interval. Later, how- 
ever, the conditioned reflex to time 
became differentiated; towards the 
end of the sixth minute, before the 
operation of the metronome (120 per 
minute), the motor reaction ap- 
peared, and towards the end of the 
sixth minute before the operation of 
the metronome (60 beats per min- 
ute), salivation began. 

E. G. Vatsuro (1948), employing a 
pattern of stimuli, established in 
dogs, over a period of several months 
without interruption, a conditioned 
reflex to time, which manifested it- 
self in the secretion of saliva 15-30 
seconds before the beginning of the 
action of the conditioned stimulus, 
but subsequently the moment of the 
commencement of salivation coin- 
cided more and more closely with the 
moment of action of the conditioned 
stimulus. On other days feeding of 
the animal was carried out at the 
same intervals of time as in the pre- 
ceding experiments, b i 
ministration of 
uli. It was foun 
reflex to time r 
of the conditi 
tion exactly. 
reproduction,” 
“reached the poi 
tity of the ratio o 
dividual reflexes 


(1948, p. 13). 
e importance of 


e systemic activity of the cere- 


bral cortex has been noted in their 
respective works þ P. S; lov 
(1929, 1931, 193 $ ae 


3), B: A, i 
(1938) and othe se 


the time factor 


e features in the formation of 
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conditioned reflexes to time, orig- 
inally discovered in experiments on 
dogs by the salivary secretion 
method, have been given greater defi- 
nition by investigations on the motor 
conditioned reflexes to time in these 
animals. 

One of the first descriptions of the 
motor-defence conditioned reflex to 
time in dogs is found in the work of 
I. S. Beritov (1932). In experiments 
on dogs he repeated electrical stimu- 
lation, causing a defensive reflex 
flexion of a front limb, every five 
minutes and found that a conditioned 
reflex to time was formed after 40 
repetitions. This reflex manifested 
itself in the fact that a minute before 
the next stimulation was due, the 
animal (which up to this point had 
been in a somnolent state) woke up, 
shook its head, and raised the limb. 

r In the literature we meet with very 
diverse findings on the question of 
the rate of, and conditions for, the 
formation of conditioned motor re- 
flexes to time in dogs. Thus, in the 
work of L. S. Gambarian (1952) there 
is evidence of the very rapid forma- 
tion of the motor-defence conditioned 
reflex to time in the case of a three- 
minute intersignal interval (unfor- 
tunately, the author, whose investi- 
gation had different aims, did not 
dwell in detail on his description of 
the process of formation of the condi- 
tioned reflex to time). On the other 
hand, O. P. Bolotina (1952a) states 
that with a three-minute interval the 
conditioned reflex to time is formed 
much more slowly, and sometimes 
als completely to develop. Thus, 
poring established a conditioned re- 

ex to pressure with a forepaw on the 
spe of a special apparatus in dogs, 
Mente ee it with food reinforce- 
dorada 10-minute intervals, Bolotina 
AE S formation ofa conditioned 
nae o time only after 18 experi- 
nts (180 combinations). The at- 


tempt to transfer the conditioned re- 
flex for the 10-minute intervals to 
shorter intervals (1-3 minutes) was 
successful only in rare cases. The 
author noted the formation of an un- 
stable conditioned reflex to a two- 
minute interval after 52 experiments 
(520 combinations) in only one of 
three dogs. In the other dogs this 
was not achieved; a stable condi- 
tioned reflex could not be formed 
even to a three-minute interval, de- 
spite a large number of repetitions 
(344). In these experiments Bolotina 
noted only the appearance of a large 
number of intersignal reactions which, 
with repetition of the experiments, 
did not diminish in number, and did 
not become concentrated in the sec- 
ond half of the intersignal interval, as 
occurred in the course of the forma- 
tion of the conditioned reflex to the 
10-minute interval. The difficulty, 
and in a number of cases the impossi- 
bility, of forming conditioned reflexes 
to short intervals of time is due, in the 
opinion of the author, to the fact that 
in the case of a very short interval of 
time the traces of stimulation fail, 
against a background of alimentary 
excitation, to become concentrated, 
but, being summated, radiate through 
the cortex, and manifest themselves 
as a mass of intersignal pressures 
(1952a, p. 32). 

According to the observations of 
Bolotina (1952a), a motor condi- 
tioned reflex to time is formed more 
readily with a compound stimulus 
than to “pure” time. Extinction of 
the conditioned reflex to time was 
reached after three-five applications 
without reinforcement. Examining 
the effect of extraneous stimuli on 
the conditioned reflex to time, the 
author found that the operation O 
extraneous stimuli immediately be- 
fore the reinforcement or in the first 
seconds of its application caused in- 
hibition of the conditioned reflex to 
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time in a number of cases, but a 
operation anywhere in the me e 
part of the intersignal interva ee 
ally failed to cause any appreciable 
changes in the conditioned reflex to 
time. Bolotina’s observations indi- 
cate that changes in the experimental 
setting or interruptions in the work 
produced very marked disturbances 
in the conditioned reflex to time. Spe- 
cial experiments made by this author 
showed that hunger disturbed the 
animal’s conditioned reflex to time, 
the manifestations being the appear- 
ance of intersignal motor reactions. 
In another series of experiments 
Bolotina (1953) investigated the ef- 
fect of bromide and caffeine on motor 
conditioned reflexes to time in dogs. 
This work showed that bromide in 
doses of 0.5-1.0 gm. hastened the 
formation of the conditioned reflex 
to time, and at the same time created 
conditions for the formation of a con- 
ditioned reflex to shorter intervals of 
time (two minutes), to which previ- 
ously a conditioned reflex to time 
could not be established. When the 
conditioned time reflex was dis- 
turbed (as, for example, in attempts 
to transfer it to shorter intervals of 
time), the use of bromide in the same 
dose promoted its restoration and 
stabilization. In doses of 2-3 gm., 
bromide intensified the conditioned 
reflex to time. The use of bromide in 
combination with caffeine had a more 
pronounced positive effect on condi- 
tioned reflexes to time in dogs. 
Motor-defence conditioned reflexes 
to time have been studied in detail 
by A. M. Kochigina in our labora- 
tory. Her experiments showed that a 
motor-defence conditioned reflex to 
“pure” time (electrical stimulation 
repeated every five minutes) was 
formed very slowly, passing through 
certain stages in the course of its de- 


- frequent in the first half of 


velopment. The process of condi- 
tioned time reflex formation began 
with the appearance of intersignal 
motor reactions, the number of which 
increased every time, reaching 50-60 
in each five-minute interval. Within 
the limits of the five-minute interval 
the intersignal reactions were irregu- 
larly distributed, as the dotted line 
curve in Fig. 1 shows. This curve has 
amarkedly undulant character, which 
means that the groups of intersignal 
reactions alternated with pauses of 
longer or shorter duration, in which 
the intersignal reactions were absent 
or scanty. This initial stage in the 
formation of the conditioned time re- 
flex we have called the stage of gen- 
eralized conditioned reflex to time. 

In the next Stage the number of 
intersignal reactions begins to dimin- 
ish, and they become less and less 
the five- 
minute interval until they are no 
longer observed in this period, being 
concentrated in the second half of the 
interval, and increasing in number as 
the moment for the next reinforce- 
ment approaches. The change in the 
number of intersignal reactions oc- 
curring in the five-minute interval 
during this Stage is shown by the con- 
tinuous line curve in Fig. 1. We see 
that it is shorter by half than the 
dotted line, being displaced into the 
second half of the interval, and shows 
a notable rise towards the moment of 
the next reinforcement, We have 
termed this Stage in the formation of 
the Conditioned reflex to time the 
Stage of formation of the differenti- 
ated conditioned reflex to time. 

p Subsequently, the number of inter- 
Signal reactions continues to dimin- 
ish until they disappear completely. 
Only in the few Seconds preceding the 
moment of the next reinforcement is 
there the Occasional appearance of a 
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Time i 7 
aan is plotted on the abscissa from the moment of 
on the ordinate, the number of intersignal 


five-mi : : . 

PE minute intersignal interval. The 

aoe the five-minute intervals in experi 
, those in a five-minute interval from ex 


aoe reaction. Thus, a more or less 
pines conditioned reflex to 
es ae been formed. It was formed 
ae 0-135 repetitions. ‘The ac- 
the a of the time reckoning after 
Scan acer of the differentiated 
Ee aoned reflex to time (expressed 
ae. ae of the time to the appear- 
ace motor reactions to the total 
a ion of the intersignal interval) 
di nged from 80 to 93 per cent in our 
ogs, 
pAs duration of the individual 
S and their relationship to one 
coe i in the process of the forma- 
aai a conditioned reflex to time 
willbe in the different animals, as 
iffer readily seen from Fig. 2. The 
ences in the duration of the sec- 


periment No. 


application of the conditioned stimulus, 


reactions in every 30-s' 
ion o! 


ticularly notice- 


ond stage were par 
parallelism was 


able, and a certain 
observed between the duration of 


this stage and the total time required 
for the formation of the differentiated 
conditioned reflex to time. 

The motor-defence conditioned re- 
flex to time was formed in the dogs 
more rapidly to the compound stim- 
ulus (sound plus the five-minute in- 
terval) than to ‘‘pure”’ time (after 64 
and 97 repetitions respectively). 
During the process of the formation 
of the conditioned time reflex to the 
compound stimulus exactly the same 
stages were observed, but the number 
of intersignal reactions under these 
experimental conditions was consid- 
erably less. The accuracy of time 
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he total time required for the formation 
of the differentiated reflex to time. 


reckoning after the formation of the 
differentiated conditioned reflex to 
the compound stimulus was the same 
as for “pure” time, 

In the dogs the 


our observations, 
ous stimuli caused 
time differentiation, and led to the 
appearance of a large number of in- 
tersignal reactions, Bromide, two 
to time was extinguished very rapidly grams, reduced the number of inter- 
(after 4-5 unreinforced applications), signal reactions, and accelerated the 
After a 10-day interruption, the con- process of the formation of a condi- 
ditioned time reflex was restored tioned reflex to time. Caffeine in 
after 10-15 repetitions, the picture of doses of 0.5-1.0 gm. led to the ap- 
restoration of the conditioned reflex pearance of a 
recalling that of its formation, but si 

with a more rapid sequence of stages. me reckoning, 

We were able to confirm the find- € were unable to establish a 
ings of our predecessors relative to motor-defence refl 
the effect of extraneous stimuli, vals of less than one minute, despite 
bromide and caffeine, on the condi- 3 
tioned reflex to time. According to 500) 


powerful extrane- 
temporary loss of 


conditioned reflex 


ĖS 
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tween stimulations we observed a 
large number of intersignal reactions 
in all the animals, increasing with 
each successive experiment. Parallel 
with this, the animals became in- 
creasingly restless with each succeed- 
ing experiment, whimpering and 
barking and trying to bite their way 
out of their harness, until finally 
further attempts had to be aban- 
doned. 

In addition to salivary and motor 
conditioned reflexes to time, the con- 
ditioned reflex changes in metabo- 
lism, induced in dogs through definite 
time intervals, were also investigated. 
The work of K. M. Bykov and his 
colleagues on this subject demon- 
strated that a conditioned reflex to 
time could also be formed on the 
basis of metabolic changes. Thus, 
G. V. Nesterovskii and A. D. Slonim 
(1936) investigated thermal polyp- 
noea in dogs (that is, accelerated 
respiration developed in response toa 
rise in the environmental tempera- 
ture), having established a condi- 
tioned reflex (to the sound of the 
metronome) on the basis of this un- 
conditioned reflex reaction. The 
authors then used the metronome 
every five minutes, and observed reg- 
ularly recurring accesses of polypnoea 
at five-minute intervals, even in the 
absence of metronome action. R E 
Oľ’nianskaia and A. D. Slonim (1938) 
observed the formation of condi- 
tioned reflexes to time in metabolic 
investigations on dogs. The animals 
were kept for five hours in a cold 
room (temperature: 10° C.) or in a 
warm room (temperature: 22? C= 
corresponding changes in metabolism 
naturally resulted—and were then re- 
turned to the kennels in which the 
temperature was below that of the 
warm, but higher than that of the 
cold room. Repeating these experi- 


ments systematically, the authors 
noted that toward the end of the five- 
hour stay the metabolism of dogs in 
the warm room began to increase, 
and that of dogs in the cold room, to 
fall. 

Considering this to be the result of 
the formation of time associations, 
K. M. Bykov remarked that “these 
associations were established to a 
definite interval of time, that preced- 
ing the transfer of the dog to the ken- 
nels in which the animals’ metab- 
olism adjusted itself to the tempera- 
ture of the surroundings” (1947, p. 
137). This level of metabolism in the 
kennels was higher than in the room 
when heating was used in the experi- 
ments, but was lower than in the 
room in the cold experiments. Sim- 
ilar results were obtained in these ex- 
periments when the animals were 
kept in a room with a neutral tem- 
perature (15-16° C.) and were then 
transferred to a warm (22° C.) or a 
cold (10° C.) room. Here, the metab- 
olism of the dog about to be trans- 
ferred to a warm room fell, and that 
of the dog transferred to the cold 
room rose. 

We have seen, then, that condi- 
tioned reflexes to time can be formed 
in dogs on the basis of quite a variety 
of unconditioned reactions. The in- 
vestigations of conditioned time reac- 
tions in dogs have shown that these 
conditioned reflexes are formed with 
much greater difficulty than condi- 
tioned reflexes to other stimuli. They 
are less stable, are easily inhibited by 
the action of extraneous stimuli or by 
change in the experimental setting, 
and are rapidly extinguished in the 
absence of reinforcement. In the 
mechanism of their formation, condi- 
tioned time reflexes resemble trace 
reflexes. They are formed to trace 
stimulations in a variety of analysers. 
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These traces, recurring at equal in- 
tervals of time, form a single complex 
stimulus which, in combination with 
the basic reaction, becomes the ex- 
citer of this reaction. Like all condi- 
tioned reflexes, the conditioned reflex 
to time in dogs has a generalized 
character in the initial stage of its 
formation, and only later does it be- 
come more or less differentiated. 


II 


Conditioned reflexes to time, so 
thoroughly studied in dogs, have also 
been examined in other animals, and 
also in man, 

P. M. Nikiforoyskii (1929) investi- 
gated conditioned reflexes to time in 
tortoises. In one tortoise he estab- 
lished a conditioned motor reflex to 
the smell of carnation oil, with the 
conditioned stimulus acting every 
three minutes, A conditioned reflex 
to “pure” time was established in 
another tortoise, the an 


a minute 
carnation 
n each oc- 
the expira- 
period, re- 
ive of what 


nt (in move- 
ment or at rest). The author re- 


marked that conditioned reflexes to 
time in tortoises are distinguished by 
their extreme instability; they are 
extinguished by the omission of only 
one reinforcement, but their re-estaþ- 
lishment requires repeated reinforce- 
ment. He also drew attention to the 
ease with which conditioned time re- 
flexes in tortoises are inhibited by ex- 
traneous stimuli (e.g., by the sound of 
a passer’s-by footsteps). 


oil smell, and the Second, o. 
casion a short time before 
tion of the four-minute 


B. I. Baiandurov (1937) made a de- 
tailed study of conditioned reflexes to 
time in birds (pigeons). His experi- 
ments were carried out in the follow- 
ing manner. The electrodes of an in- 
duction apparatus, in the primary 
circuit of which there was a timing 
device closing the circuit after a defi- 
nite interval, were brought into 
contact with the foot of a pigeon, 
placed in a special chamber. Move- 
ment of the foot in response to stimu- 
lation by the current was recorded by 
means of a lever on the smoked drum 
ofa kymograph. Describing the re- 
sults of his investigations, the author 
noted Particularly that it was impos- 
sible to produce a conditioned reflex 
to short intervals of time (5-15 min- 
utes) in birds, despite a large num- 
ber of repetitions, Even the forma- 
tion of a conditioned reflex to an in- 
terval of 30 minutes took place quite 
slowly (after 300 repetitions), The 
conditioned time reflex in birds was 
unstable and Was rapidly extin- 
guished (after two-three omissions of 
reinforcement), The conditioned re- 
flex to time, when formed, was not al- 
Ways retained until the following day. 
Alcohol abolished the conditioned 
reflex to time, In the course of the 
Ormation of a Conditioned reflex to 
time in birds, Baiandurov noted the 
appearance of involuntary move- 


J 1e intervals between the 
electrical stimulations (on the 25th 


closure, 
The formation 
flexes to time in y 


been noted ina n 
K. M. Byk: 


of conditioned re- 
arious animals has 
umber of works by 
~YXOV and his colleagues in 
connection with the analysis of dif- 
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ent types o! penio ana 
Pero Eal functions. Thus, L. G. 
eee 949), examining the daily 
hee Jip the long-eared hedge- 
fee : un that in these animals 
ae ieee muscular activity oc- 
0200 h uring the night-time (1600- 
A T E and activity fell to 
et. in the daylight hours. 
a ent with this, there was an 
Nica a eke the secretion of gastric 
“This ra evening and night hours. 
aa w iole picture,” wrote the 
Pion can be regarded as the for- 
to time o i is conditioned association 
beere aborated in the process of 
Entan = in connection with the 
aotea” (1949, 9199) pete 
Pee ie food to the hedgehogs in 
Hilatoy Set ier during daytime, 
TRA ee new conditioned re- 
als ime, asa result of which the 
ori monteta two periods of 
day AT secretory activity—one by 
e one by night. From his ob- 
ee oe concluded that 
fidan ned linkage to time is a 
Feat ental element in the daily pe- 
Dts I: rest and activity in this 
Conditio animal. The formation of 
eaei S reactions to time in long- 
tHe: ay Igehogs was confirmed by 
Mote ae of L. A. Isaakian 
ined a ho investigated the condi- 
En NEAN T thermoregulation mech- 
Eton of these animals. When the 
But ae Ricca temperature had 
ith, frit ined a number of times 
OP thie ax particular setting and time 
A te periment, the author found 
Re riper ie days after change of 
tabolism ure the former level of me- 
ample dace maintained. For ex- 
or 20 a hen the animals were kept 
temper ays in a chamber with the 
ee at 22~25° C., the body 
tion poles and oxygen consump- 
me established at a definite 


level. When the animals were trans- 
ferred to a room with a temperature 
of 10-12° C., no noticeable change in 
their metabolic rate was observed 
during the first two-three days, an 
event which the author correctly re- 
garded as the result of the formation 
of a conditioned reaction to the ex- 
perimental setting and time of the 
previous experiments, and, in the ab- 
sence of reinforcement, the extinc- 
tion of the time association so formed 
took place, according to his observa- 
tions, on the third to fifth day. 
Natural conditioned reflexes to 
time, associated in their origin with 
the times of taking food, have been 
described by L. A. Isaakian (1953b) 
from his investigations on rabbits. 
His observations showed that in rab- 
bits the temperature of the ears rises 
regularly during the hours of feeding. 
For example, the ears of rabbits 
which were fed at 1500 hours gen- 
erally showed a raised temperature 
from 1200 to 1800 hours, after which 
the temperature fell abruptly. The 
same daily variations in ear tempera- 
ture were seen in hungry animals, 
that is, in rabbits which had not been 


fed at the established hour. When 
was changed (trans- 


the daily regime 

fer of the feeding time from 1500 to 
2000 hours), a new rhythm of ear 
temperature variation was formed, 
the maximum rise of temperature 
falling within the period 2000 to 0200 


hours. 

The importance of the develop- 
ment of conditioned reflexes to time 
in the creation of a daily periodicity 
in basal metabolism was noted also 
by A. D. Slonim (1945) and by A. G. 
Ponugaeva (1949) in experiments on 
bats. 

Ke B: Svechin 
changes in the daily: 
ological functions 1 
that, in youns 
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from the conditions of maintenance 
in stalls to field life, ee type o 
ily fluctuation in pulse rate an 
eo. developed, and that this 
was maintained on days when the ex- 
ternal conditions (variations in the 
outside temperature, in relative at- 
mospheric humidity, etc. were con- 
siderably changed from those prevail- 
ing on the preceding days. In ex- 
actly the same way also, j animals 
transferred from field conditions to 
stall conditions retained for several 
days the periodicity in physiological 
functions established while they were 
kept in the open. The author ex- 
plains the results of his investigations 
as indicating that a definite rhythm 
of changes in the environmental con- 
ditions creates a conditioned reflex 
to time in the animals. 

The formation of conditioned re- 
flexes to time under various condi- 
tions in monkeys is described in a 
number of works by K. M, Bykov 
and his colleagues, 

The formation of the conditioned 
reflex to time in monkeys (Abyssin- 
ian baboon, M., rhesus, M. lapunder) 
was observed in an unusual form by 
O. P. Shcherbakova (1937, 1938, 
1949). By artificially reversing the 
alternation of light and darkness, the 
author was able to reverse the ani- 
mal’s daily periodicity, By creating 
two changes of light and darkness in 
the 24 hours, Shcherbakova estab- 
lished a regime of two “nights” and 
two “days” in the course of 24 hours, 
It is a curious fact that, on transition 
to the previous normal light regime, 
the periodic changes in functions cor- 
responding to the prolonged, artifi- 
cially created periods of light and 
darkness, were maintained for some 
time. “It seemed to me completely 

logical to assume,” wrote K. M. 
Bykov, in connection with these ob- 
servations, “that the regular occur- 


rence of alternation of day and night 
(irrespective of whether they are 
natural, astronomical, or created 
artificially for the monkeys by alter- 
nation of the periods of lighting) 
likewise creates a conditioned reflex 
to time in the cerebral cortex” (1947, 
p- 153). 

Later K. P. Ivanov, A. R. Maka- 
rova, and A. A. Fufacheva (1953) con- 
tinued these investigations of Shcher- 
bakova on the effect of periodic al- 
ternation of light and darkness on 
the periodicity of physiological func- 
tions in monkeys (macaques, ba- 
boons, long-tailed monkeys). Not- 
ing that the level of gaseous exchange 
was lowest at the time twilight ap- 
proached, these authors regarded 
twilight illumination as the signal 
stimulus, being a component of the 
daytime lighting and occurring also 
at the boundary between the two 
astronomical periods, day and night, 
which determine different conditions 
in the organism—the transition of 
the animal from daytime activity to 
nighttime rest. This signal stimulus 
enters into the natural conditioned 
reflex pattern. The authors note that 
when artificial twilight was created 
30 minutes before the advent of nat- 
ural twilight, the fall in metabolism 
at the usual time of twilight was 
maintained unchanged for the first 
two days, and only on the third day 
did a complete change in metabolism, 
corresponding to the new light condi- 
tions develop. “Thus,” concluded 
the authors, “twilight not only causes 
a fall in metabolism and is of active 
inhibitory character, but is also 
closely associated with a conditioned 
time reflex, involving the entire daily 
periodicity in the Physiological func- 
tions of the organism” (p. 101), 

Analysing the daily periodic fluctu- 
ations in the blood sugar content of 
monkeys (M. rhesus, Abyssinian ba- 
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boon), E.S. Kanfor (1949) found that 
an increase in blood sugar occurred 
in relation to the feeding times, and 
regarded this as the result of the de- 
velopment of a conditioned reflex to 
time. 

The formation of a conditioned re- 
flex to the feeding time in monkeys 
was also noted by L. G. Voronin 
(1948). He observed the formation 
of a conditioned reflex to the time of 
feeding, manifested by the appear- 
ance of sucking movements and gen- 
eral motor unrest towards the times 
of feeding in young rhesus monkeys 
at about the third week. 

L. G. Voronin (1951) described the 
formation of conditioned linkages to 
time in monkeys (baboons, ma- 
caques, long-tailed monkeys) in ex- 
periments in which a system of stim- 
uli repeated at regular intervals was 
used. Thus, when the conditioned 
foodseelking reflex was repeated every 
two minutes, it was very soon noted 
that toward the end of the second 
minute after the administration of 
the stimulus the monkey ran to the 
feeding box and tried to reach the 
food. After establishment of the 
conditioned reflex to the stereotype 
of stimuli, operating at minute inter- 
vals, it could often be seen that, even 
in the experiment without use of the 
conditioned stimuli (“sham” experi- 
ment), the animal ran to the food box 
every minute and pressed the lever. 
At first the conditioned reaction to 
time was not absolutely accurate 
(the animal applied pressure 10-15 
seconds before the end of the minute 
interval), but with continued feeding 
of the animal every minute, the reflex 
to time became progressively more 
and more exact. 

„According to the findings of Voro- 
nin and his colleagues, the condi- 
tioned reflex to “pure” time (in which 
only feeding of the animal was car- 
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ried out at regular intervals) was 
more difficult to establish in mon- 
keys. For example, in rhesus mon- 
keys establishment of a conditioned 
reflex to “pure” time (1 minute) re- 
quired about 400 feeding reinforce- 
ments. In Voronin’s opinion, the con- 
ditioned reflex to time in these cases 
is formed much more quickly, but is 
masked by the conditioned reflex to 
the setting. 

When the intersignal intervals are 
changed from one to two oF three 
minutes, the conditioned reflex to the 
former time is preserved in the first 
two or three experiments, but fairly 
soon thereafter (after 6-10 experi- 
ments) a conditioned reflex to the 
new time is established. In monkeys 
it is much more difficult to establish 
conditioned reflexes when the inter- 
signal intervals are prolonged to 
three-five minutes. When the inter- 
val between stimuli was considerably 
lengthened, a large number of inter- 
signal reactions became evident, and 
these Voronin regarded as the mani- 
festation of the earlier established 
conditioned reflexes to time and as 
a result of disinhibition of the condi- 
tioned reflex to the setting. The op- 
timum interval between stimuli for 
the formation of a conditioned reflex 
to time in monkeys is thought by the 
author to be one-two minutes. 

Voronin regarded the suppression 
of the conditioned reflex to the stimu- 
lus preceding the differential signal in 
the stereotype of stimuli, which was 
frequently observed in monkeys, as @ 
special case ofa conditioned reflex to 
time (“retrograde inhibition of E. A. 
Asratian, 1934). In the experiments 


of Voronin and his co-workers, the 


system of stimuli was 50 arrange 
that every third or fourth (and some- 
times sixth) stimulus was a differenti- 


ation stimulus. Consequently the 
second, third, or fifth stimulus was, 
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in Voronin’s opinion, not only the 
signal of a positive conditioned reflex, 
but was also a signal for the following 
differentiation, so that sometimes 
these signals acquired also a negative 
signal significance. ‘‘We suggest, 
wrote the author, ‘‘that we are deal- 
ing here with an instance of that 
power of the nervous system to 
“reckon time,” which shows itself in 
the form of a conditioned reflex 
(1951, p. 221). ‘ 

O. P. Bolotina (one of Voronin’s 
co-workers) gives a detailed descrip- 
tion in her work (1952b) of the for- 
mation of conditioned reflexes to 
time in monkeys. According to her 
observations, such a conditioned re- 
flex is readily formed, and begins to 
appear even on the first experimental 
day, but the time required for its 
stabilization increases as the inter- 
signal interval is lengthened. The 
conditioned reflex forms more slowly 
to “pure” time than to a compound 
stimulus (time plus auditory or 
photic stimulation). In the latter 
case Voronin employed a pattern of 
sound and light stimuli, following 
each other at regular intervals, while 
reinforced and unreinforced stimuli 
recurred regularly in a fixed order, 
When the stimuli were suppressed 
the conditioned reflex reaction ap- 
peared at the same intervals, but in 
the places of the unreinforced stimu- 
lations the reflex was unstable. When 
the unreinforced stimulations were 
excluded from the pattern, the condi- 
tioned reflex to time was more stable. 
The conditioned reflex to “pure” 
time (the interval between stimuli 
being one minute) appeared toward 
the end of the third experiment, but 
was stabilized only after 450 repeti- 
tions. 

In a special series of experiments 
Bolotina made observations on 
change in the time interval. These 
experiments showed that the longer 


the time interval was, the greater 
was the number of repetitions re- 
quired for stabilization of the condi- 
tioned reflex to time. On transition 
from one interval to another, the first 
two days showed a conditioned reflex 
to the former time interval, although 
even from the very beginning a con- 
ditioned reflex to the new interval 
was also in evidence. On the follow- 
ing days inhibition of both the old 
and the new conditioned reflexes was 
observed, and this the author ex- 
plains as the manifestation of an ori- 
entation reaction to the new arrange- 
ment of the experiment. From the 
third day intersignal reactions ap- 
peared in large number, but these 
were gradually controlled and be- 
came more and more concentrated 
toward the end of the new time inter- 
val. The rapid formation of condi- 
tioned reflexes to time in monkeys 
and their facile transition from one 
time interval to another are ex- 
plained, according to the author, by 
the great mobility of the nervous 
processes in these animals. 
That the conditioned reflex to time 
monkeys is more stable than in 
other animals was indicated by the 
absence of any appreciable inhibitory 
effect from the operation of an ex- 
traneous stimulus (whistle) towards 
the end of the intersignal interval. 
uring extinction of the conditioned 
reflex to time a large number of inter- 
signal reactions first appeared, and 
only later (after from 
hr. 6 min, 
develop. E: 
wave-like 
reflex was 


in 


easily re-established. In 
Conditioned reflex to 


r nge in the experimental 
foe PS R for example, delay of 
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In her own subsequent investiga- 
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tions O. P. Bolotina (1953) found 
that small doses of bromide (0.2- 
0.35 gm.) had a certain controlling 
effect on conditioned reflex activity, 
evidenced by a reduction in the num- 
ber of intersignal reactions. Larger 
doses (2-3 gm.) exercised a sustained 
Positive action on the reflex to time. 
A mixture of bromide and caffeine 
had a more definite, positive effect. 
Caffeine, in doses of 0.02-0.07 gm., 
produced an increase in the number 
of intersignal reactions in all the 

monkeys. 
A According to the findings of O. P. 
olotina and A. A. Popova (1953), 
phenamine, in a dose of 0.3 mg./kg., 
led to slight disturbance of time reck- 
oning in the monkeys in the daytime, 
with increase in the number of inter- 
signal | reactions, whereas during 
nighttime it rendered the conditioned 
ae to time more exact. In a dose 
eee mg./kg., it caused considerable 
a Pition of reflexes to time during 
Me her Mc in the hours of night it 
ee fied the reflexes and made 

Ta RFS exact. 
TOON re of Voronin and his co- 
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motor aa coming from the 
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t on trace stimulations from 
e digestive tract i 
renti ract, these being con- 
tical ra ined in relation to iden- 
ihe se a of time. These traces 
into a tetas cortex are built up 
ke ee state of 
a condition a s. This state becomes 
ous aig aie for the nerv- 
Organism t which adjusts the entire 
ions of th o meet the material condi- 
952b, x a 4 (Bolotina, 
} ; 
x Eeer ormatie of conditioned 
Or animal ime has been established 
classes s of quite a variety of 
and am (reptiles, birds, mammals), 
ong mammals, for representa- 
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tives of various divisions (insecti- 
vores, cheiropterids, rodents, carni- 
vores, artiodactyls, monkeys). If we 
may judge from the similarity of the 
process of formation of conditioned 
reflexes to time in the different ani- 
mals (the relative difficulty of their 
production in comparison with con- 
ditioned reflexes to other stimuli, 
their facile inhibition, their rapid ex- 
tinction in the absence of reinforce- 
ment, etc.), the nervous mechanism 
involved in the process is basically 
the same in all vertebrates. In the 
natural living conditions of many 
animals the formation of conditioned 
reflexes to time is of considerable im- 
portance in the development of a pe- 
riodicity of physiological functions, 
typical for the particular animals. 
Investigations on the conditioned re- 
flexes to time in various animals con- 
firm Pavlov's view on the mechanism 
for time reckoning by the nervous 
system, and confirm that the power 
to reckon time is inherent in the corti- 
cal cells of all analysers, and that it 1s 


unnecessary tO postulate the ex- 
istence of a special cortical “time an- 
alyser,” as has been suggested by 


several physiologists (P. M. Niki- 
forovskii and others). 
The results of inves 
conditioned reflexes to 
mals may fin 


tigations on 
time in ani- 
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zation of rational 
maintenance and econ 
tion of farm animals. 
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ing, watering, moving catt 
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tablishment of cond cesses ; 
che times of these processes; 

Ng establishment of a complex time 
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determined, not only by the fact that 
it instils order into and facilitates the 
work of man in caring for animals, 
but also because the positive physi- 
ological effect of the measures being 
pursued is considerably enhanced 
thereby. Time, as a unique exciter of 
reflex activity, is capable (in conjunc- 
tion with other stimuli) of creating 
and strengthening a definite course in 
physiological processes, of giving 
them a definite rhythm, a definite in- 
terrelationship in time. This un- 
doubtedly creates more favorable 
conditions for the course of each of 
the processes. It is for this reason 
that the observation of strict regu- 
larity in the feeding times of animals 
increases the productivity of the 
food, that the strict maintenance of a 
regular milking routine increases the 
yield, and so on. 

In State farm and collective farm 
practice measures are taken to create 
a constant regime of animal main- 
tenance: times and routines for feed- 
ing are laid down (this is often done 
Separately for different groups of ani- 
mals), and definite times are fixed 
for exercising the animals, for milk- 
ing cows, etc. Numerous examples of 
these measures are given in the work 

of A. V. Kvasnitskii and V. A. 
Koniukhova (1954). In agreement 
with the results of laboratory in- 
vestigations on conditioned reflexes 
to time, their formation and con- 
solidation in practical animal farming 
occurs more rapidly when the condi- 
tioned reflex is established not to 
“pure” time, but to a complex stim- 
ulus. In practical work also an at- 
tempt is made to combine the time 
of a particular productive process 
with some kinds of concomitant sig- 
nals. In some cases, for example, 
feeding time was linked with the 
rapid striking of a rail, exercise time 
with slow striking, and so on, Very 
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often such concomitant signals, 
strengthening the conditioned reflex 
to time that had been formed, were 
verbal stimuli (being, of course, only 
sound stimuli to the animals). Thus, 
the arrival of exercise time was linked 
with the word “walk,” the arrival of 
the milking time for each cow with 
the pronouncing of its name, etc. 
Change in the established regime, 
that is, a change from some earlier 
established conditioned time reflexes 
to others, presents a certain amount 
of difficulty for the nervous system, 
as can be judged from the results of 
laboratory investigations, The same 
has been observed in animal hus- 
bandry under natural conditions. 
A. V. Kvasnitskii and V. A. Koniu- 
khova (1954) illustrate this by the fol- 
lowing example. Special investiga- 
tions showed that in pigs under the 
usual feeding conditions (morning, 
noon and evening) the secretion of 
gastric juice is maximum towards the 
feeding times, and falls during the 
rest of the time, particularly during 
the night. When the feeding times 
were changed (evening, midnight and 
early morning) the previous condi- 
tioned reflex to time, that is, increase 
in the secretion of gastric juice at the 
times of the previous day’s feeding, 
was preserved for several days, and 
only gradually was a reflex to the new 
feeding times established. Such a 
reconstruction of the time pattern 
entailed a period of disturbance in 
the functioning of the coordinated 
Organs, and indeed any disturbance 
of the established regime produced 
the same result. Kvasnitskii and 
Koniukhova cite as an example the 
reduction in the milk-yield by five 
per cent and in the fat content of the 
milk to 0.2-0.4 per cent that resulted 
from milking 30-49 minutes later 
than the usual time, 
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agreement with the views of Pavlov, 
that“... dynamic changes in nerve 
traces, as the ‘differentials’ in the 
reckoning, constitute the funda- 
mental element in the reckoning of 
time” (1951, p. 837), Frolov pointed 
out that the speed with which traces 
from stimuli are extinguished may 
vary with the state of the cortical 
cells, and that various errors in the 
assessment of time are, in his opinion, 
connected therewith. 

According to his findings, a tend- 
ency to over-assess time (within lim- 
its of five-eight per cent) was con- 
stantly observed in experiments on 
healthy individuals of various ages 
and occupations. The author ex- 
plained this by the development of 
internal inhibition in the subjects 
under the experimental conditions 
(darkness, silence), in consequence 
of which the trace from the signal- 
stimulus, from which the reckoning 
of time began, was damped down 
more rapidly than usual. In other 
words, trace processes reached a cer- 
tain degree of i ier than 


ntensity earlier 
they should, and therefore the true 
interval of time see 


med longer to the 
subject (“time was extended”). The 
same was O 


bserved, according to the 
author, in the case 0 


f anticipation. 
When the excitability of the cortical 
cells was heightened, on t 
hand, time was under-assessed. 


Frolov described pronounce 


the cerebral cortex, an 
jon of traces from the 


rapid suppress! 
stimuli. : ee 
“From a practical point of view it 
the author con” 


is important to note, 2 
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of the subjects’ health, the degree of 
their adaptation to work, and also the 
presence of emotional elation or de- 
pression were immediately reflected 
in the accuracy of their time reckon- 
ing, inducing acceleration or slowing 
in the calculation, over-assessment or 
under-assessment of the passage of 
time” (1951, p. 838). 

During the establishment of a 
rhythmical pattern, the formation of 
conditioned reflexes to time in man 
was noted and examined in connec- 
tion particularly with the analysis of 
one of the phenomena observed dur- 
ing the process, and first described by 
V. M. Bekhterev (1908). 

Examining the motor reactions of 
man to rhythmically recurring 
sounds, Bekhterey observed that, 
after sudden cessation of the sound, 
the subject continued to execute sey- 
eral movements at the former rate. 
Also, as the period over which the 

sounds were in operation increased, 
and as the rate at which they fol- 
lowed one another was raised, so the 
number of motor reactions executed 
after the cessation of the sound in- 
creased. Subsequently, CHP; Zelenyi 
(1923) made a special study of this 
phenomenon by setting the subject to 
beat in time with a metronome (120 
beats per minute). When the metro- 
nome was stopped (after 120 beats) 
the subject made several additional 
movements at the same rate, and this 
occurred even when the subject was 
asked not to make any excess moye- 
ments. The author also observed 
that when, after an interruption in 
the rhythmical work, the metronome 
gave one beat, the subject made two 
or several movements in succession 
with the same intervals between them 
as between the beats of the metro- 
nome. In a joint work with B. N. 
Kadykov, G. P. Zelenyi (1937) found 


that the motor reactions very often 
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outstripped the conditioned signals, 
and developed before the next suc- 
ceeding beat of the metronome. Ac- 
cording to the observations of K. M. 
Bykov (1925), the rhythm of the 
movements does not always coincide 
with the rhythm of the metronome; 
some subjects offer slower move- 
ments, some more rapid, and some 
again make movements at an irregu- 
lar rate, while in only 49 per cent of 
the subjects was the rate of the move- 
ments coincident with the beats of 
the metronome, According to By- 
kov’s findings, the number of excess 
movements after cessation of the 
metronome did not exceed three in 
the majority of subjects (76 per cent), 
although in some cases it reached 
nine. Both Bykov and Zelinyi regard 
these “excess” movements as the re- 
sult of the development of a condi- 
tioned reaction to time. 

Detailed investigations of the same 
Phenomenon were carried out re- 
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after the conditioned signal, to be re- 
placed by heightened excitability as 
the moment for the action of the next 
signal approached. This the author 
regarded as a “latent” manifestation 
of the conditioned reaction to time. 
Assuming that the development of a 
conditioned reaction to time is de- 
termined by the degree of concentra- 
tion of the conditioned excitation in 
the period leading up to the next rein- 
forcement, the author explains the 
differences in the formation of this 
reaction at different intervals as fol- 
lows: in the presence of large intervals 
following negative induction, excita- 
bility increases slowly, while in the 
presence of short intervals it rises 
steeply. Since between the condi- 
tioned signal (beat of the metronome) 
and the complex stimulus which in- 
duces the conditioned reaction to 
time, there occur, according to the 
author, active induction relation- 
ships, it follows that in the case of 
longer intervals (as a result of the 
slow increase in excitability), the ex- 
ternal signal proves stronger, and the 
reaction to time isinhibited. Whereas 
with short intervals (in consequence 
of the steeper rise in cortical excita- 
bility) "“ . . . this state of the cortical 
cells emerges as an independent con- 
ditioned stimulus, inhibiting the re- 
action to the sound and inducing the 
motor reaction” (Alekseev, 1953, p- 
896). 

The formation of conditioned re- 
flexes to time has been noted also in 
the course of investigations on the 
daily periodicity in the physiological 
functions of man. Thus, investiga- 
tions of daily variations in the com- 
Position of the blood in man have 
shown that in many people the num- 
ber of leucocytes increases at the 
usual hours of eating (Voronov & 
Riskin, 1925; Orlova, 1937), and that 
this increase is the result of the for- 


mation of a conditioned reflex to the 
time of taking food (Belen’kii, 1949). 
A. G. Urin and E. S. Zenkevich 
(1952) stated that not less than six— 
seven days were required for the for- 
mation of a conditioned leucocytic re- 
flex to the time of taking food. When 
the time of eating was changed, the 
conditioned leucocytic reflex was ex- 
tinguished after two-four days, and 
the longer the conditioned reflex had 
taken to become established, the 
slower was its extinction. The extin- 
guished reflex was restored rapidly on 
return to the previous conditions of 
eating. 

The features attaching to the for- 
mation of a conditioned reaction to 
time in man were investigated in our 
laboratory by A. S. Dmitriev. The 
observations were made on children 
aged 8 to 14 years. Conditioned 
motor reactions were established in 
the subjects to sounds with verbal 
reinforcement, the conditioned stim- 
ulus being repeated at equal inter- 
vals of time (25-30 seconds). 

In many of the children the forma- 
tion of a conditioned reaction to time 
began with the appearance of inter- 
signal motor reactions (after 5-13 
repetitions), and indeed these reac- 
tions appeared also at times later 
(every five-six repetitions or even 
more frequently) at quite varied 
times in the intersignal interval. The 
character of the distribution of all the 
intersignal reactions in one of the 
subjects in the first experiment 1S 
shown by the dotted line curve 1n 
Fig. 3. In subsequent experiments, 
however, the number of intersignal 
reactions diminished, and appeare 
mainly in the second half of the inter- 
signal interval (as is shown by the 
continuous line curve in Fig. 3). Fi- 
nally, a more or less differentiated 
conditioned reaction to time was 
formed, and the intersignal reactions 
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in dogs (cf. Fig, 


1 and 3). The duration of the stages 


and their relationships to one another 
differed in different children, as the 
graphical illustrations in Fig. 4 show. 
The total time required for the for- 
mation of a differentiated condi- 
tioned reaction to time in children 
varied from 29 to 82 repetitions, 

It must be noted, however, that the 
formation of a conditioned reaction 
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to time in this form was noted pre- 
dominantly in children of early school 
age. But some of these intersignal re- 
actions appeared in small numbers 
even after a large number of repeti- 
tions, a fact which prevented exact 
observation of all the stages in the 
formation of a conditioned reflex to 
time. In children of middle school 
age the conditioned reaction to time 
appeared in the form of intersignal 
reactions in only 42.8 per cent of all 
cases; in most no intersignal reac- 
tions whatever were seen. It was, 
less, possible to establish a 
ioned linkage to time in this 
children, but by a different 
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tion the appearance of the condi- 
tioned reaction within the limits of 
the first signal system in the form of 
intersignal motor reactions. This 
feature in the formation of condi- 
tioned reactions to time should, 
obviously, be more pronounced in 
adults. 

There is no doubt that investiga- 
tions of conditioned reactions to time 
in man are important for the solution 
of a number of practical problems. 
Such problems include the establish- 
ment and stabilization of a rational 
regime of work and rest, the arrange- 


ment of the school-day for children, 
the rhythm 
nutritional regimes, etc. 
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the perception and measurement of 
time and its features in individuals 
of different ages, and such informa- 
tion might be used to advantage in 
training and education. Thus, fur- 


ther study of the conditioned reac- 
tions to time in man is closely bound 
up with the solution of important 
practical problems in medicine, psy- 
chology, and education. 
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THE EFFECTS OF HIGH I 
NTENSITY INTERMITTENT SOUN 
ON PERFORMANCE, FEELING, AND PHYSIOLOGY? E 


ROBERT PLUTCHIK 
Hofstra College 


There is one aspect of the general 
P opier of the effect of high intensity 
i e on man that has been insuffi- 

ntly explored; this concerns the 
a of the effects of irregular or 
Pitas sound on performance, 
Siang and physiology. _ Evidence 
an des been accumulating during 
ses ew years suggests that such 
pe a tent sound has both different 
TEn pae disturbing effects on Ss 
Such Jose of steady sound sources. 

ich differences may, in addition, 


h i i i i 
ave important theoretical implica-- 


tions for the understanding of brain 
mechanisms. 
A aoe report will, therefore, attempt 
F view and analysis of the literature 
ealing with the problem of the ef- 
fects of interrupted sound on various 
To of human functioning, par- 
icularly as related to perceptual- 
ocd activity, subjective report, and 
ody physiology. 
Ti a Berrien published a re- 
a the effects of noise as applie 
SNP y to industrial environments 
the reported that, although much of 
i evidence was inconclusive, there 
e some indications that noise 
tended to affect work output and 
speed of work. This review noted 
ne there are marked individual dif- 
erences in susceptibility to the ill- 
oe of noise. No attempt to dis- 
Ta the effects of intermittent 
bre Sean) noise was made except to 
. ae increased unpleasantness of 
ittent sounds. 
Supported by Contract Nonr-2252(01 
pa me Office of Naval Research. eve 
pee ee of the United States 
s permitted. 


Kryter in 1950 published an ex- 
tensive review of the literature of the 
effects of noise on men. His conclu- 
sions from studies conducted under 
laboratory conditions indicated that 
experimental studies can be grouped 
under three categories: 

1. Experiments which demon- 
strated deleterious effects of noise. 
Nearly all, if not all, can be heavily 
criticized on one or more points so 
that findings can be accepted only 
with considerable reservations. 

2. Experiments which demon- 
strated slight, inconsistent or incon- 
clusive detrimental effects from noise. 

3. Experiments that demonstrated 
that man can do certain types of 
muscular and mental work as effi- 
ciently and productively in noise as 
in quiet, even for prolonged periods. 
For some few tasks, noise apparently 
improved performance. 

However, it is obvious that previ- 
ous studies have not sampled all the 
different types of behaviors of which 
man is capable. Kryter also notes 
that different kinds of sounds may 
have marked effects on feelings as 
well as on certain physiological meas- 
ures. These will be described in later 
sections of this review. 

Since Kryter’s report, @ number of 
studies have appeared which indicate 
that noise can affect performance. 

An experiment by Miller (1953) 
tested the effects of a 90 db,” 8000 cps 
tone on the four measures of critical 
flicker fusion, cancelling c's, word 


2 All designations of intensity in this report 
will be decibels with reference to 0 
dynes/cm. $q- =0 db unless otherwise noted. 
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fluency, and trembling. Only the last 
measure showed any effect (i.e., in- 
crease in trembling) as a result of the 
sound. In the Benox report, Miles 
(1953) indicated that Ss exposed to a 
115 db sound for three hour periods 
showed an impaired performance on 
Coordinated Serial Reaction Time— 
i.e., on a task in which the S directed 
a beam of light at a series of targets 
by means of airplane controls. This 
was reported as being the most effec- 
tive of all the tests for showing an 
effect of noise on performance. 
Broadbent (1951a) described an ex- 
periment in which the S was required 
to observe and correct any pointer 
that exceeded the danger mark on a 
series of steam pressure gauges. The 
pointer signals were presented at ran- 
dom in a series of test runs each 90 
minutes long. Ten Ss participated, 
each for five days, and were exposed 
in the sequence: quiet, quiet, noise, 
noise, quiet. Noise levels for quiet 
were 70 db, and noise 100 db. The 
results indicated that Ss performed 
more poorly on the noise days than 
on the quiet days preceding and fol- 
lowing the noise sessions. When the 
task was made simpler (the dials 
were replaced by lights whose bril- 
liance was sharply intensified at ran- 
dom asa signal) another group of Ss 
showed no significant effect of noise. 
Other studies by Broadbent (1951b, 
1953, 1954) were designed to investi- 
gate the effects of pacing on the per- 
formance of a vigilance task under 
continuous 100 db noise conditions, 
More errors were made on the task 
under noise conditions than under 
control conditions in each experi- 
ment. However, the results of the 
pacing were inconsistent in two 
studies. In one study it was found 
that fewer errors were made by the 
group under unpaced conditions 
(stimulus presented after every re- 
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sponse) than by the group under 
paced conditions (stimulus presented 
every second). In the other study no 
difference was found between paced 
and unpaced performance. Unlike 
the first study, however, in the latter 
task, performance was prolonged 
over a one and one-half hour period 
suggesting that the Ss may have 
adapted to the pacing and noise con- 
ditions. More errors were made un- 
der the noise condition. 

Using a modification of the above 
procedure, Jerison and Wing (1957) 
had Ss watch three clocks side by 
side whose pointers made double 
jumps on a random basis of about 
one per minute. The Ss watched the 
clocks during two hour sessions and 
pressed a key when each double jump 
occurred. Comparing a “quiet” con- 
dition of 83 db with a noise condition 
of 114 db produced by a loudspeaker 
with frequencies of 20 to 9600 cps, 
they found that performance de- 
creased significantly in the final half- 
hour of the noise condition, No 
change in performance occurred dur- 
ing the two-hour “quiet” condition. 

In another clock-watching experi- 
ment Jerison (1956) reported a 
definite impairment of performance 
with increased time in the task situa- 
tion, and concluded that this was due 
Partly to noise, Partly to fatigue, and 
Partly to the unreliability (test-re- 
test) of the taskitself. An important, 
but unexpected observation was that 
Ss who showed marked auditory fa- 
tigue after the test tended to perform 
at the same level in noise as in quiet, 
whereas those Ss with only mild hear- 
ing losses tended to fall off from their 
starting performance level. This find- 
ing emphasizes the important fact of 
individual differences, 

Another effect of noise that has 
very recently been discovered is the 
fact that time estimation is affected 
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by noise levels. Hirsh, Bilger, and 
Deatherage (1956) have reported 
that estimates of durations up to 16 
seconds were systematically distorted 
if the noise levels during the presen- 
tation of the interval and during the 
Ss’ attempt at reproducing that in- 
terval were different. The change 
was in the direction of overestimat- 
ing the interval when the noise level 
was higher during the reproduction 
period and underestimating the inter- 
val when the reverse was the case. 
Jerison and his co-workers eri- 
son, Crannell, & Pownall, 1957) also 
found an effect of noise on time esti- 
mation. Two hundred Ss, working in- 
dividually, were required to follow a 
moving target visually and to imag- 
ine the continuing movement of the 
target after it disappeared. When the 
target was believed to have reached a 
crosshair, the S squeezed a trigger. 
Noise of about 110 db (frequency 
range: 20—-10,000 cps) was introduced 
at certain times. It was found that a 
noise program in which it was quiet 
when the target disappeared gave 
longer judgment times relative to 
those obtained under control condi- 
tions of quiet or noise throughout. 
The opposite program of ‘‘noise then 
quiet” was not differentiated from 
the control conditions. It was also 
found that judgment times became 
longer in succeeding trials under all 
four noise conditions. An effect of 
noise on subjective time judgments 
has also been noted by Loeb (1957). 
In addition to these recent studies 
that indicate an effect of noise on 
certain kinds of performance, there 
have been a number of observations 
on the effects of intermittent noise on 
Be penance. Many years ago Cassel 
Foes allenbach (1918) found that an 
ie mittent noise resisted habitua- 
Sona than a continuous one. 
ilarly, Laird (1933) reported that 
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Ss who were required to put a stylus 
through small irregularly spaced and 
sized holes in a moving tape, showed 
that the drop in output was greater 
when the noise distractor was made 
to vary in intensity in an eight- 
second cycle than when its loudest 
component was presented as a steady 
sound. 

In two recent studies, K. R. Smith 
(1950, 1951) compared the per- 
formance of two groups of college 
students on number checking, name 
checking and form board tests. The 
experimental group was exposed to a 
series of 100 db intermittent sound 
bursts of 10 to 50 seconds in length, 
but so arranged that the total noise 
time was equal to the total silence 
time during the 30-minute period of 
the tests. It was found that the ex- 
perimental group tried more items, 
scored more correctly, scored more 
incorrectly, and was less accurate 
than the control group. Some of the 
differences were small but significant. 
One other interesting fact emerged: 
with two exceptions, the experi- 
mental group was more homogeneous 
(i.e., smaller SDs) than the control 
group. 

Another experiment which reports 
an effect of intermittent noise on per- 
formance is that of Corso (1952). He 
used a 100 db, 100-3000 cps noise, 
which was introduced intermittently 
throughout the test period while the 
Ss worked on the Minnesota Clerical 
Test and the Minnesota Form Board 
Test. Here, just as in the previously 
cited experiment by K. R. Smith, the 
Ss attempted more test items, got 
more correct, and made more errors, 
even though their performance was 
reported as less variable under the 
noise than under the control condi- 
tions. (Those who had performed 
well on the tests under control condi- 
tions tended to do poorer under the 
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stress, while on the average, those 
who had performed poorly in the 
control session improved their scores 
under stress.) 

In summary then, it is possible to 
show that recent research indicates 
that high intensity noise has an effect 
on certain types of complex tasks, 
and that intermittent noise, in the few 
cases where it has been studied, seems 
to have a greater tendency to impair 
performance than steady noise. 

On the basis of this past research, 
three hypotheses are offered: 

1. Performance impairments are 
more likely to occur during high in- 
tensity intermittent sound than un- 
der lower intensity or steady sound. 

2. The relative difficulty of the 
task is an important variable deter- 
mining the effect of noise on per- 
formance. The more difficult the 
task, the more likely will noise be 
disruptive. 

3. Individual differences in reac- 
tion to noise are extremely important 
to consider in evaluating research. 
There is evidence to indicate (Plut- 
chik, 1955) that the mean of a group’s 
tesponses may be misleading if there 
are marked individual differences or 
subgroups within the larg rgroup. It 
is hoped that these hypotheses will 
be systematically tested in subse- 
quent research. 

One other major question is in- 
volved in a study of the effects of 
noise on performance. It is evident 
that a high intensity noise of what- 
ever form can be considered a stress 
stimulus, and the literature dealing 
with stress should have relevance to 
noise and performance studies. A 

recent review and critique of studies 
of stress (Harris, Mackie, & Wilson, 
1956) presents the following findings 
and conclusions: 

1. There are wide individual dif- 
ferences in reactions to stress. The 
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reasons for these different reactions 
have not been clearly identified. _ 

2. The majority of the studies 
have been concerned with the effects 
of relatively short-term stress condi- 
tions, which means that temporary 
compensatory performance is possi- 
ble. In many cases, the durations of 
the stress stimuli have not been re- 
ported. 

3. Apparently no one has syste- 
matically investigated the ‘relation 
between the intensity of the stress 
stimulus and its effect on behavior. 
-~ 4, Few investigators have at- 
tempted to observe the course of be- 
havior during stress. 

5. Of the many different experi- 
mental designs which have been used, 
it is suggested that the subject- 
control design which compares a per- 
son’s performance under both stress 
and non-stress conditions, is more 
satisfactory than a random group or 
group-control design method. 

_ In the light of these observations it 
is evident that much further research 
is needed to clarify inconsistencies 
and to develop a theoretical rationale 


for the results which have been re- 
ported. 


Tue EFFECTS or INTERMITTENT 
SOUND ON FEELINGS 


There are ac least three aspects to 
this problem that require examina- 


tion: (a) the nature of the feelings “*% 


that are associated with high in- 
tensity noise; (b) the effect of such 
noise on threshold, adaptation, and 
auditory acuity; and (c) the special 
subjective characteristics of inter- 
mittent, repetitive, or pulsed sounds. 

With regard to the first aspect, the 
iterature is almost unanimous: high 
intensity noise, even when it may 
have no effect on performance, will 
generally produce symptoms of dis- 
comfort, irritability, and distraction- 
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To illustrate: In the Benox report 
(Miles, 1953) a steady noise of 115 db 
heard for three-hour periods pro- 
duced fatigue and discomfort; Miller 
(1951) had his Ss listen to 111 db 
noise for 30-minute periods and they 
reported irritation, distraction, and 
general disturbance. Blau (1951) re- 
ports that a high intensity noise 
source of 103 db which accompanied 
the administration of various tests 
had practically no effect except to 
tend to arouse ‘somatic complaints 
of specific anatomical location and 
description,” and Mendelson and 
Conway (1947) who exposed 10 
volunteers to jet engine noise for 14 
days and a total of 19 hours (at about 
120 db over the range of 20-8000 
cps), found that seven of the 10 Ss 
reported fatigue, irritability, and 
nervousness. : 

In spite of this agreement on the 
subjective effects of high intensity 
noise, there have» been very few 
studies designed to isolate the more 
disturbing aspects of the noise spec- 
trum. One of the few dealing with 
this problem studied the relative an- 
noyance produced by various bands 
of noise (Reese & Kryter, 1944). The 
authors used filter systems to divide 
“white” noise into several bands, and 
five Ss were asked to adjust each 
band of noise to equal a standard 
band (of 1900 to 2450 cps at 94 db) 
in “annoyance” value. It was found 
that frequencies above 2000 cps were 
more annoying than those below it. 
The results showed that “annoy- 
ance” as a characteristic of sound is 
discriminable from loudness, al- 
though with continued testing the 
annoyance and loudness contours be- 
came less separated. 

One other early study dealing with 
this problem (Laird & Coyey. 1929) 
had 14 Ss compare the relative an- 
noyance of eight different frequencies 
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ranging from 64 to 8192 cps, by the 
method of paired comparisons. A 
U-shaped curve resulted with fre- 
quencies of 256 to 1024 least annoy- 
ing, with a marked rise in relative 
annoyance for lower and higher fre- 
quencies. The authors conclude that 
at low intensities annoyance is ap- 
proximately proportional to loudness 
while at high intensities (over about 
80 db), the frequencies below 500 cps 
follow the equal loudness curves, 
while frequencies over 500 are equally 
annoying at lower loudness levels. 
Thus, annoyance is different from 
loudness per se. 

Kryter summarized the findings 
dealing with the annoyance value of 
various noises by citing the aspects of 
a sound which tend to affect annoy- 
ance value. These are: (a) unexpect- 
edness, (b) inappropriateness, (al- 
though this is a term which is diffi- 
cult to specify), (c) intermittency 
(irregular, variable sounds are more 
annoying than steady ones), (4) re- 
verberation (lack of localizing in- 
creases annoyance), (e) loudness (the 
threshold at which any sound be- 
comes annoying has not yet been un- 
equivocally determined), (f) fre- 
quency pat ern (sounds having their 
energy concentrated in the higher 
audible frequencies are more an- 


noying). 
So far as it has been possible to de- 
termine, the relative annoyance 


value of different kinds of intermit- 
tency has not been studied. 


FATIGUE EFFECTS OF HIGH 
INTENSITY SOUND 


In spite of many reports on the 


after effects of high intensity noise, 
including the extensive study by 


` Davis (1942) during World War II 


using intensities up to 130 db and 
durations as long as 64 minutes, the 
problem has not yet been entirely 


138 z 


clarified. In general, it is known that 
in auditory fatigue, duration of stim- 
ulation has a cumulative effect from 
30 seconds up to at least 10 minutes 
(Hallpike & Hood, 1941; Harris, 
1953); that the maximum fatigue ef- 
fects of a given frequency may be a 
half octave higher (Davis et al., 
1943); and that there are very 
marked individual differences in sus- 
ceptibility to auditory fatigue (Har- 
ris: 1953, 1954; Wilson, 1950). The 
existence of marked individual dif- 
ferences are attested by the fact that 
the least susceptible Ss may return to 
normal in less than seven minutes 
from stimulation which the most 
susceptible do not recover from in 
more than 24 hours (Harris, 1953). 
Hearing loss, according to Harris 
(1953) tends to be a linear function 
of both stimulus duration and stim- 
ulus intensity, but for fatigue-re- 
sistant individuals recovery from 10 
minutes of exposure to noise levels of 
120 db takes only about seven min- 
utes. Hearing loss for relatively 
short exposures, seems to be com- 
pletely reversible. For example, a 
recent study (Thwing, 1956) re- 
ported that the adaptation produced 
bya 70 db tone of 1000 cps presented 
for six minutes is followed by com- 
plete recovery in about one minute 
after the termination of the adapting 
stimulus. 

A test for screening fatigue-sus- 
ceptible individuals has been re- 
ported in the literature by Wilson 
(1950). He exposes the S to a 2048 
cps tone at 80 db intensity for five 
minutes, and threshold shifts great 
enough to prevent the S from hearing 
a tone 10 db over his own threshold 
after two minutes recovery are used 
as a criterion of fatigue. Harris 

, (1954) suggests that a better criter- 
ion is either the time in seconds to re- 
turn to within 5 db of own threshold, 
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or the residual hearing loss after one 
minute. In both studies, the test 
tone is presented at 4096 cps since 
this is the region most affected by 
stimulation at 2048 cps. It is of some 
interest to note, in this connection, 
that when white noise, with a fre- 
quency band over most of the audi- 
tory spectrum, was used at 100-120 
db for several minutes, the frequency 
of maximum fatigue, for all observ- 
ers, was reported at between 4000 
and 8000 cps (Hirsh & Ward, 1952). 

Another more recent test of noise 
susceptibility has been proposed by 
Jerger and Carhart (1956) on the as- 
sumption that an ear’s tendency to 
develop a permanent hearing loss is 
related to that ear’s reaction to tem- 
porary acoustic stress. They found 
that out of 178 Air Force jet- 
mechanic trainees only 15 (or eight 
per cent) showed a hearing loss of 10 
db or more on audimeter tests in the 
3000-4000 cps range, eight weeks 
after a three day (12 hour) exposure 
period to jet-engine noise (SPL level 
unspecified, but probably in the 
range of 120-140 db). There was a 
slight but significant positive correla- 
tion between the amount of tem- 
porary threshold decrease to a 3000 
cps, one minute duration tone, at 100 
db, and the hearing loss found after 
eight weeks. 

Several relatively recent studies 
have attempted to determine the ef- 
fects of jet-engine noise on hearing 
loss. Mendelson and Conway (1947) 
used 10 Ss exposed to sound inten- 
sities of about 120 db for a total of 19 
hours, and reported that hearing 
losses of 20 to 60 db at frequencies of 
512 to 2048 cps vanished after @ 
week-end’s rest and did not return. 
Davis et al. (1953) exposed 17 men to 
21 bursts of jet-engine noise (at 126- 
150 db) each lasting 15 seconds, and 
Separated by 30-45 second intervals 
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of relative silence. Audiograms taken 
two to three hours after exposure 
showed no temporary loss in any S$ 
although the authors point out that 
people who come in constant contact 
with jets show definite hearing losses. 
Eldred and his co-workers (Eldred, 
Gannon, & Gierke, 1955) found that 
a one minute exposure to jet noise at 
130-135 db produced a slight hearing 
loss followed by complete recovery 
within eight hours. 

In the Benox report (Miles, 1953) a 
study is cited by Silverman (1947) in 
which he was able to locate the pain 
threshold for both normal and hard 
of hearing ears at about 140 db using 
earphones. The sensation of pain 
t it hurts”) was distinguished clearly 
aoe sensations of auditory discom- 

ort (“it is too loud”), and of touch 
(“it tickles,” or “I feel something in 
the ear”). 
The Benox researchers compared 
the pain threshold in a free field, us- 
ing jet engines as noise source, with 
pain threshold determined by ear- 
phones and concluded that they were 
the same. They also noted that in 
the frequency range between 800 and 
2000 cps the sound became “uncom- 
fortably loud” at sound levels well 
below the pain threshold. This latter 
finding is consistent with the work of 
Hardy (1952) who points out that 
certain sounds, even 85 db, if pro- 
longed over months and years may 
cause some degree of deafness. He 
hpa ides that sounds which exceed 
00 sones per octave band are prob- 
ably damaging with long-time daily 
xposure but that no damage is €x- 
Pected if no octave band exceeds 50 
Sones, 
p nay be concluded from these 
the as observations and reports that 
B ort time exposures to high in- 
Aen y noise levels in the laboratory 
not likely to produce any kind of 


permanent hearing losses, and that 
the effects which do appear are very 
transient, although there may be 
some marked individual differences 
in the speed of recovery. 


Some SUBJECTIVE CHARACTERISTICS 
or REPETITIVE SOUNDS 


When pure tones are presented 
repetitively for very brief durations, 
then at least two important phenom- 
ena come into existence. If the dura- 
tion of each tone is extremely short, 
the tone is heard as a click which 
does not have a discernible fre- 
quency. As the duration is gradually 
increased a ‘“click-pitch” threshold 
is found (i.e.) the shortest duration 
of a tone which allows the tone to 
have some pitch character to it), and 
then a ‘‘tone-pitch” threshold is 
found (the shortest duration of a 
tone in which the tonal character, 
rather than the click character is 
dominant). The exact values of 
these thresholds decrease with fre- 
quency, within the range of 125-8000 
cps, but in no case is it greater than 
18 milliseconds for the click-pitch 
threshold, or 25 milliseconds for tone- 
pitch (Doughty & Garner, 1947). 

The second phenomenon referred 
to is a result of the abrupt presenta- 
tion or removal of a tone. If a pure 
tone is interrupted at a given fre- 
quency, not only is the original tone 
obtained but also additional fre- 
quencies which are a function of the 
rate of interruption. A square wave 
pulse modulation produces a spec- 
trum of sideband frequencies with 
maximum energy in the central com- 
ponent. Changing the repetitive 
rate or the duration of a pulse 
changes the spectrum of energy. Ac- 
cording to Garner (1947b) energy 
changes at the rate of 6 db in the 
central component for every doubling 
or halving of either the repetition 
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rate or duration; each doubling or 
halving of the repetition rate or dura- 
tion changes the total energy by 3 db. 
The threshold response is primarily 
determined by the duration of the 
pulse, and not the repetition rate, for 
low rates. This spectrum of sideband 
frequencies adds a click to any pure 
tone that may be used. 

Although some studies (Garner: 
1948, 1949) have made no attempt to 
eliminate or deal with the click intro- 
duced by the abrupt onset or de- 
crease of a tone, two remedial pro- 
cedures have been mentioned in the 
literature. One way to suppress the 
click due to switching the tone on and 
off is to produce a gradual increase 
and decrease of the tone, rather than 
to use a square wave (Luscher & 
Zwislocki, 1949); Munson (1947) 
used a three millisecond rise and fall 
time for this purpose while Miller 
and Heise (1950) used a 20 millisec- 
ond rise and fall time. Another way 

of dealing with this problem is to use 
a wide-band noise source with a uni- 
form spectrum (‘‘white-noise’’) as the 
interrupted tone because the side- 
bands Produced by the interruptions 
fall within the existing spectrum and 
do not produce a change in the sub- 
jective character of the tones. This 
procedure was used by Pollack (1941) 
and by Miller and Taylor (1948), 
Interestingly enough, the early 
study by Shower and Biddulph 
(1931) compared the effect on rela- 
tive discrimination of the transients 
introduced by abrupt Switching on or 
off of a tone. They found that the 
only effect was a decrease in the 
Weber ratio for frequencies below 500 
cps. This indicates that the effect, 
if any, of a click cannot be assumed 
to necessarily disrupt or inhibit some 
auditory function, without investiga- 
tion. 
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When tones are presented repeti- 
tively in brief pulses, there are sev- 
eral parameters that need to be speci- 
fied in addition to the intensities and 
frequencies used. There is the dura- 
tion of the tone, the repetition rate, 
the percentage of the total time that 
the tone is on, which represents an 
on-off ratio, and the total energy in 
decibels (Garner, 1948). Miller 
(1948) has called the on-off ratio by 
another name, the sound-time frac- 
tion, in a study of auditory “flutter” 
In his study he reports that in general 
the same types of relations for audi- 
tory “flutter” hold as do those for 
visual flicker with the exception that 
the critical flutter frequencies (135 
bursts per second for one S and 270 
bursts per second for the other) are 
much higher for fusion than are vis- 
ual flicker rates. 

In an extension of this work on 
auditory flutter, Pollack (1952) used 
flutter rates from 0.4 to 200 bursts 
per second, with five Ss. When the 
relative change in flutter rate (that 
is, Af/f) is plotted against flutter fre- 
quency, a minimum is found in the 
region of 10 per second. Pollack 
hypothesizes that this is not unre- 
lated to the fact that the alpha 


rhythm of the brain is also about 10 
per second. 


Several interesting studies have 


been reported dealing with the ability 
to count short repetitive pulses, and 
with the loudness of pulses in com- 
Parison to steady tones. Taubman 
(1944), for example, had his Ss judge 
the number of dots that would be 
Sounded while different numbers of 
dots, from one to six were presented 
at either 10 per second, 14 per second 
or 18 per second rates. He found rela- 
tively little difference in the number 
of dots judged at these three different 
rates, although knowledge of results 
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seemed to help. His data suggest 
that these three rates cannot be 
easily discriminated. 

Consistent with this finding is the 
work of Cheatham and White (1954) 
and Garner (1951). Cheatham and 
White used a 1000-cycle tone pre- 
sented in pulses of 10 per second, 15 
per second, and 30 per second, with 
each pulse of 11-millisecond duration 
and 70 db intensity. For four Ss, it 
was found that regardless of the ob- 
jective rate used, the subjective rate 
for auditory perception of sound 
pulses approaches a limit of about 9 
to 11 pulses per second. In addition 
it was noted that the variability of 
response increases suddenly for mean 
perceived numbers higher than five. 
Garner used a similar task and pro- 
cedure, and reports the following con- 
clusions: (a) the duration (5-40 
msec) and intensity (up to 94 db) of 
the tone had no effect on counting 
accuracy; (b) the curve for a repeti- 
tion rate of 12 per second is almost 
identical with that for a 10 per second 
rate; and (c) there are very large in- 
dividual differences in counting ac- 
curacy. 

Garner has also reported in an- 
other study (1947b) dealing with 
threshold in relation to repetition 
rate, that as the repetition rate in- 
creases, the threshold decreases, for 
all frequencies used (250-4000 cps) 
although there is a break in the curve 
at rates between two and five tones 
Per second. Garner notes that al- 
though the total energy in a stimulus 
t directly proportional to the repeti- 
cae rate, it has been shown that the 
i does very little integrating of 
ae energy beyond a duration 
i 00 milliseconds (or five pulses per 
econd). 

e other studies are relevant to 

S problem. Mowbray and his co- 
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workers (Mowbray, Gebhard, & By- 
ham, 1956) used 10 interruption rates 
of white noise, ranging from 1 to 320 
bursts per second at an intensity of 
70-80 db. These pulse rates were 
given to the Ss as standards and were 
to be matched by a variable pulse 
rate. When the average deviation of 
the matchings were plotted against 
pulse rate on logarithmic coordinates, 
two functions were revealed, sug- 
gesting a change in the method of fre- 
quency discrimination at about five 
bursts per second. This was inter- 
preted as meaning that the listener is 
able to count the noise bursts from 
one to five per second but not above 
five. 

This interpretation is nearly but 
not quite consistent with the results 
of an experiment by Licklider and 
others (Licklider, Webster, & Hed- 
lun, 1950) dealing with the threshold 
for disappearance of binaural beats 
produced by pure tones. These ex- 
perimenters found that the subjective 
character of beats changes in the 
range between 2 and 10 beats per 
second from periodic fluctuation to 
“roughness,” depending upon fre- 
quency. When Ss were asked to 
count the beats, the curve relating 
threshold to frequency was nearly a 
straight line at 8 beats per second. 
These studies, therefore, suggest that 
the counting ability of a human ob- 
server of tones, pulses, or beats has 
an upper limit definitely less than 10 
and probably less than 8 per second. 

In addition to these facts concern- 
ing the ability to discriminate pulse 
frequencies, there are several reports 
on the loudness of brief repetitive 
tones. Garner (1948) had six Ss 
equate the loudness of a series of re- 
peated short tones with the loudness 
of a steady tone of the same fre- 
quency using an 80 db intensity 
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level. His results showed the follow- 
P A series of pulses may be louder 
than a steady tone of the same in- 
tensity level. 3 

2. There is a maximum difference 
in loudness, in favor of the repeated 
short tones, at about 60 db. At 
higher and lower intensity levels, the 
relative advantage in loudness for 
pulse is less. 

3. The loudness of pulses is rela- 
tively highest between 1000 and 4000 
cps. 

4. The most consistent matching 
of loudness is shown at 5-10 pulses 
per second, although a decrease in 
pulse duration tends to increase vari- 
ability of judgments. 

5. Loudness increases much less 
rapidly than does intensity. 

In another related experiment deal- 
ing with binaural loudness matching 
with short tones, Garner (1947a) re- 
ported that the accuracy of loudness 
matching is best both below and 
above tone durations of 20 millisec- 
onds (with repetition rates of 1-10 
per second), and that repetition rates 
from 1-100 pulses/second had little 
effect on the variability of loudness 
matching. He notes as do many 
other investigators, that the differ- 
ences between individuals are much 
greater than the differences within 
individuals. 

In a study designed to study the 
loudness of noise rather than pure 
tones, Pollack (1941) reported find- 
ings somewhat similar to Garner's. 
He found that the interrupted noise 
sounds louder than a continuous 
noise of the same energy, being rela- 
tively greatest at rates of 2-10 pulses 
per second; and that the absolute 
threshold of a white noise (of con- 
stant sound-time fraction) increases 
as the frequency of interruptions in- 
creases from 1-300 pulses per second. 


Pollack points out that the range of 
2-10 pulses per second is the range 
wherein there is enhancement of vis- 
ual brightness of a flickering light; it 
is also where the most acute intensity 
discrimination is found between two 
tones as a function of the difference 
in frequency between the tones. This 
same range also covers the alpha 
rhythm and its first subharmonic. 

In summary it may be concluded 
that humans have certain limitations 
with regard to the counting of rapid 
pulse signals. These limits include a 
range of about 1-12 pulses per sec- 
ond, a range in which certain interest- 
ing threshold, loudness, and varia- 
bility data can be found. It may be 
stated as an hypothesis that such 
findings concerning auditory percep- 
tion may be related to the alpha 
rhythm and various visual phenom- 
ena associated with it. 


PHYSIOLOGICAL EFFECTS OF 
Loup Sounps 


There are many published reports 
which indicate an effect of loud or 
repetitive sounds on physiological 
function. In his review of 1950, 
Kryter cites two earlier studies, one 
which found a rise in blood pressure 
following a loud unexpected sound 
(Lovell, 1941), and the other, which 
reported a decrease in peristaltic con- 
tractions and flow of gastric juices 
(Smith & Laird, 1930) following two 
10-minute periods of noise, 


result. Davis ( 
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old in intensity (about 120-130 db) 
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(1950) at of this study, Davis 
rise ilas und that Ss made slight 
auditor responses to subliminal 
iai ee as well as to supra- 
is type ole In an extension of 
clinical o experimental study to 
hes ee Malmo and his co- 
1954. + avis, Malmo, & Shagass, 
1950) ( almo, Shagass, & Davis 
oc ae 3-second, 1000 cycle 
found that db, 90 seconds apart) 
ics with e f group of psychoneurot- 
greater a ee anxiety showed a 
tromyo and more long-lasting elec- 
Pe response from the 
Mal Ss teehee control group of nor- 
Ore signif that arm muscles showed 
Muscles oe differences than head 
Significant eee were, however, no 
on ifferences between schiz- 
Meena a control Ss. 
studies guld be noted that in the 
ae in the fay Egat was some de- 
z1 continued ¢ e ension responses 
S not re esting, although this 
ported in Malmo’s study. 
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In three investigations of the bio- 
logical effects of jet-engine noise, two 
reported positive effects, and one re- 
ported none. Parrack (Parrack, 
Eldridge, & Koster, 1948) exposed 
men to 10-minute periods of jet-en- 
gine noise at 120-150 db and to a 
1300 cps siren; the Ss reported heat- 
ing of the skin, vibration in parts of 
the body, muscular weakness and ex- 
cessive fatigue, all of which disap- 
peared within a week after cessation 
of tests. Allen, Frings, and Rudnick 
(1948) using 10-second exposures of 
a 20,000 cps tone at 160-165 db 
(re: 10-1 watts per sq. cm.) reporte 
that flies, mosquitoes, roaches and 
caterpillars were killed within min- 
utes, while human Ss developed skin 
burns, slight dizziness, and unusual 
fatigue. In contrast to these two re- 
ports, Finkle and Poppen (1948) re- 
ported no measurable effect on @ wide 
variety of physiological measures of 
one-hour periods of exposure for 
days and two-hour periods for five 
days, to jet-engine noise at about 120 
db. The difference in re 


sults between 
this study and the preceding two 
may be due to 


the lower intensity 
noise and to the difference in the dis- 
tribution of energy in 


the different 

frequency ranges- 
Several other reports have been 
made of the effect of sound on physi- 
Mendelson (1957) 
i x in man 


out of 23 Ss showed a 
Krauskop 


viewed the prev! 
d sounds on visior 
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findings as being based mainly on 
subjective impressions. 

In the Benox Report, Ward (1953) 
presented the results of observations 
of the electroencephalograph of one 
S while he was exposed to siren fre- 
quencies of 245-700 cps at 120-137 
db. When the earplugs were removed 
from one ear, there was a “striking 
and prompt abolition or desynchro- 
nization of the parieto-occipital alpha 
rhythm, at the same time good acti- 
vation of the EEG in the fronto- 
temporal region bilaterally was pres- 
ent, consisting of desychronization 
with an increase in the low potential, 
fact activity. When the eyes were 
open, the addition of the siren noise 

to one ear, added nothing to the 
alpha blockade already present. 
These EEG effects appeared to be- 
come less marked with repeated 
opening of the ear during any given 
run, suggesting the possibility that 
adaptation of central neural circuits 
can occur under these conditions.” 

It is unfortunate that no other Ss 

were tested since the two other 

studies which measured EEG’s (Fin- 
kle & Poppen, 1948; Mendelson & 

Conway, 1947) during exposure to 

jet-engine sounds report no, or equiv- 
ocal, changes in EEG recordings. 
There is another important area of 
research to which attention should be 
drawn, although it might seem quite 
peripheral; this is the work dealing 
with sound-precipitated convulsions 
in animals. The most recent review 
of the literature by Bevan (1955) cov- 
ering the period from 1947 to 1954, 
included 145 titles in the bibliog- 
raphy. The major findings of rele- 
vance here seem to be: 

1. Pure-tone experiments would 
indicate that the most effective fre- 
quencies to produce convulsions in 
rats and mice, lie above 8000 cps, 
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and the most effective intensities 


over 100 db. 

2. At least one author has found 
that an intermittent pure tone (9500 
cps) with a three-per-second inter- 
ruption rate (a rate similar to that of 
certain biolectric potentials in epi- 
lepsy) was most effective in produc- 
ing convulsions. 

3. Stimulation of other sense mo- 
dalities without sound does not seem 
to produce seizures. 

4. Many different variables, both 
in the Ss and in the environment, in- 
fluence the incidence and magnitude 
of seizures. 

In the light of some data to be pre- 
sented in succeeding sections, it is 
possible that the phenomenon of 
audiogenic seizure is one which is not 
limited to rats and mice, but that 
humans too may show dispositions 
toward such a response under certain 
conditions. 


PHYSIOLOGICAL EFFECTS OF 
REPETITIVE SOUNDS 


Relatively few studies have dealt 
directly with this problem although 
there are some which have great the- 
oretical value; these, largely by Lov- 
ett Doust and his co-workers will be 
described in some detail. One of 
their first papers appeared in the 
British journal, Nature (Lovett 
Doust, Hoenig, & Schneider, 1952). 
They reported that the use of a flick- 
ering light on 25 normal Ss at fre- 
quencies between 3 and 32 flashes per 
second produced marked changes in 
oxiometrically determined arterial 
blood oxygen-saturation values. 
Flicker rates between 3 and 9 per 
second, and also between 12 and 17 
per second produced a decrease of 
blood oxygen-saturation values, while 
this was normal at 9-11 flashes per 
second, and elevated at frequencies 
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of 18-22 per second. The comment 
was made in this paper that similar 
changes can be produced by replacing 
the flickering light by a relatively 
low-intensity auditory stimulus mod- 
ulated to the same frequency. 

In their next, more extensive paper 
(Lovett Doust & Schneider, 1952) the 
authors pointed out the rationale for 
such studies: 


Biological rhythms exist in a rich variety 
and almost bewildering profusion to attend 
and equilibrate the physiology of man. Such 
dynamic phasic activity appears not only to 
be intimately concerned with the phenomen- 
ology of life and biological processes in general, 
but is also to be found in purely chemical 
systems. Modalities of the periodicities asso- 
ciated with life can be divided into those ex- 
ternal to the organism—including diurnal and 
climatic variation, sun-spot activity, etc., and 
into those inherent within the individual such 
as the respiratory and cardiac rhythms, the 
menstrual cycle, sleep and awakening. Only 
less well marked are certain psychological 
periodicities such as “eyclothymic” variations 
in mood and personality. In the course of the 
present century much painstaking research 
has attempted to link external with internal 
rhythmic activities, significant correlations 
being adduced between seasonal variation 
and, for example, the incidence of psychiatric 
disorder, immunity from disease, tempera- 
ment and behavior, and an impressive array 
of biochemical and physiological variables 
ranging from blood pH, lactic acid and pro- 
tein to breath-holding time, plethysmography, 
tests of hand strength and fatigability, dark 
adaptation time and various tests of urinary 
function, 

Neurophysiologically, there is evidence 
both for chemical as well as for electrical pat- 
terns of periodic activity in the nervous sys- 
tem. Repetitive discharges with frequencies 
of 5-10 cycles per second from isolated stel- 
late ganglion preparations were found to be 
capable of change by variation in the ionic 
concentrations of the surrounding medium 
and Kaufman and Hoagland have findings 
Suggesting a chemical pacemaker closely 
identified with cerebral respiration. The 
function of the diencephalon as a neural pace- 
maker has long been postulated, and it ap- 
Pears certain that the hypothalamic aggrega- 
tion of nuclei with its neural and endocrine 


connections must play an outstanding role in 
emotional and awareness variations (p. 640). 

Using 109 Ss, 58 normal controls 
and 51 hospital patients, they re- 
ported data showing that the arterial 
oxygen saturation levels vary very 
consistently (as described above) for 
normal Ss regardless of the type of 
rhythmic stimulation used. (Photic, 
auditory, and cutaneous stimuli were 
used.) Maximum anoxemia (oxygen 
saturation decrements) occurred at 
5 and 15 pulses per second in the nor- 
mal Ss and they showed in addition a 
summative effect of simultaneous 
sonic and photic stimulation. The 
patient group showed similar but not 
identical stimulation profiles al- 
though some showed unpredictable 
variations in response to the different 
frequencies. 

“By depressing the oxygen levels 
by the choice of optimal stimulation 
frequencies, spontaneous comments 
by the healthy subjects revealed con- 
siderable changes in affect and levels 
of awareness, while, among the pa- 
tients, repressed unconscious ma- 
terial was brought into conscious- 
ness.” Some of the spontaneous 
comments made during the anoxemia 
periods included: ‘concentration 
poor, feel slowed up; tired, drowsy, 
sleepy; irritable, annoyed, fed up; 
desire to stop machine or break it; 
headaches, dizziness, or giddiness.” 

The third paper of the series (Lov- 
ett Doust, 1953) discussed the appli- 
cations of these findings to an analy- 
sis of mental illness. Lovett Doust 
presented the hypothesis that an im- 
portant feature of mental disorder, 
at least in the physiological sphere of 
reference is a relative anoxemia, 
either in the resting state or accom- 
panying the dynamic response of 
psychiatric patients to stress situa- 
tions. The fourth paper of the series 
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(Lovett Doust & Schneider, 1954) 
presented some preliminary data ona 
new treatment for patients with psy- 
chiatric disorders based on rhythmic 
sensory stimulation, a method which 
Doust calls ‘Rhythmic Sensory 
Bombardment Therapy” or R.S.B.T. 
Regardless of the actual effective- 
ness of rhythmic sensory stimulation 
for the treatment of the mentally 
ill, this work is very important in 
showing a basic similarity of physio- 
logical effect of repetitive stimuli in 
different sense modalities. From 
this point of view, it makes sense to 
examine the studies of visual flicker 
as a (possibly) reliable guide to the 
effects of auditory flicker. 

A visual phenomenon similar, in 

certain respects, to the one described 
by Lovett Doust, was discovered by 
Bartley in 1938. This is the fact of 
brightness enhancement, which is an 
increase in apparent brightness of a 
flickering light, at certain rates some- 
what below those that bring about 
fusion. This has been studied by us- 
ing two targets side by side—one 
steadily illuminated and the other 
intermittently, The intensity and 
rate of intermittency of the intermit- 
tent target are varied until the 
brightness of the two targets is 
matched. As the rate js reduced from 
the fusion point less and less intensity 
is required for the intermittent tar- 
get to match the steady one. This 
continues until the pulse rate is in the 
neighborhood of 10 per second, the 
alpha rhythm of the human S. Here 
the intermittent target becomes 
about twice as effective as the steady 
one. With rates lower than this, the 
effectiveness of the intermittent tar- 
get declines again. It has also been 
found that a light-dark ratio of one- 
to-one produces maximum relative 
enhancement and that the pheno- 
menon occurs only with high inten- 
sity stimuli. 


In addition to this Bartley effect, a 
flickering light produces many other 
results in human Ss. At an ancedotal 
level, many people have unpleasant 
effects of flicker produced by driving 
through a forest or thicket of regu- 
larly planted trees which consistently 
and sequentially interrupt natural 
light sources; similar effects have 
been observed when the S stands still 
and observes a light through a rapidly 
rotating object such as a fan. In re- 
cent years, flickering lights have 
been used as an aid to diagnosis of 
“petit mal” epilepsy through ob- 
servation of changes in electroen- 
cephalographic pattern. Most recent 
of all is the interest in “photic driv- 
ing,” as a possible clue to the nature 
of (hypothetical) reverberating cir- 
cuits within: the brain which are 
thought to be concerned with con- 
sciousness and perception, 

In a fairly extensive and recent 
study of the effects of flickering light 
on human Ss, Bach, Sperry, and Ray 
intensity light 
flashes at varying frequencies to de- 
termine the effects on subjective dis- 
comfort, tapping rates, walking, pur- 
suit rotor performance, and rifle fir- 
ing. Their results may be summa- 
rized as follows: 

1. Unpleasant subjective effects 
are consistently reported when Ss 
are exposed to diffuse flickering 
light. These effects can be grouped 
int6 four main categories: 


(A) Interference with conscious- 
ness—e.g.: blank mind, 
drowsy, dizzy, paralyzed 

(B) Sensations involving the eyes 
—e.g.: fatigue, irritated, wa- 
tering 

(C) Sensations of muscle twitching 
—e.g.: eyes blinking, facial 
twitching, jumping of body 

(D) Sensations relating to other 
parts of body—e.g.; queasy 


i 
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feeling in stomach, headache, 
nausea, chills, tense muscles. 
(Although the subjective re- 
ports in this experiment seem 
to have an unpleasant char- 
acter, there is at least one 
other study in which pleasant 
sensations were also reported, 
for example, sensations of 
warmth which may become 
quite pleasurable, or illusory 
conversations, the content of 
which could not be recalled 
(O'Flanagan, Timothy, & Gib- 
son, 1948). 


ac The most consistently effective 
me pfreguentey for the production 
é subjective effects is 9 flashes per 
€cond, although any particular fre- 
pooney. is not extremely critical be- 
Ween the limits of 7 and 20 flashes 
Be second. The effects are not cumu- 
me with time of exposure beyond 
Be five minutes, while maximum 
ee occur with high brightness of 
liel eld of view. Monochromatic 
ee nt seems to hold no advantages 
ks white light as indicated by pre- 
eaer, tests with red, blue, and 
a ite light. Some degree of drowsi- 
“eg was reported in all cases where 
eee was modulated by the 
the one 9 cps EEG activity of 
ae Hand-eye coordination was sig- 
li Marily impaired by a flickering 
R t for one task (tapping) but not 
oe ected on another supposedly more 
ee tapping task. Rifle firing ac- 
racy was significantly depressed 
a (Approximately 50%) when a 6 
is ene light was placed behind 
Orde arget at brightness levels of the 
isie oE magnitude of the scattered 
the ais searchlight beams, but when 
igi irection of the diffuse flickering 
ing ee! toward the target, rifle fir- 
ing curacy increased. Rate of walk- 
under conditions of diffuse flick- 
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ering light did not seem to be signifi- 
cantly affected, even in the presence 
of obstacles and with a continuously 
moving light source. 

This report therefore, as well as a 
host of previous studies (Ellingson, 
1956) indicates clearly the marked 
effects which high intensity intermit- 
tent stimuli may have on behavior, - 
feelings, and on certain physiological 
measures. In fact it has been re- 
cently possible to develop an index 
of anxiety proneness by examination 
of the EEG reactions to photic stimu- 
lation. Ulett et al. (1953) tested 191 
Ss, both patients and controls, on 
various psychological tests and by 
interviews, and had them rated for 
anxiety proneness under stress. EEG 
records were then taken during con- 
ditions of resting and intermittent 
photic stimulation. The results indi- 
cated that there was a significant cor- 
relation between the psychological 
criteria of anxiety proneness and: the 
percentage of fast, slow and low am- 
plitude alpha in electronically an- . 
alyzed resting EEG responses; the 
amount of harmonic responses in the 
20-30 cps range when flicker fre- 
quencies of one half or one fourth of 
this were used; and the amount of 
subjective dysphoria produced by 
photic stimuli. A check list of EEG 
anxiety indicators derived from this 
correlated +0.51 with the validating 
criteria of anxiety proneness. 

As far as could be determined 
there is only one study which has re- 


ported an attempt to “drive” brain 
ans of intermittent 


rhythms by me 3 
auditory stimulation. This study 
was described very briefly in an ab- 


stract of a talk to an EEG Congress 
by Goldman (1952). He reported 
that pure tones, which „were inter- 
rupted in rhythmic fashion at rates 
of 1.5 to 25 per second, were used and 
that EEG changes appeared showing 
acoustic driving in temporal areas 17 
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two out of eight cases. No other data 
were given and it is evident that repe- 
tition of this study would be interest- 
ing and fruitful. 


SUMMARY AND CONCLUSIONS 


This review has attempted to sum- 
marize and integrate a number of 
articles dealing with the effects of 
loud and intermittent sounds on 
human behavior, feeling, and physi- 
ology. Most of these studies have been 
published since 1950 when the last 
comprehensive review was written, 

Some of the more recent experi- 
ments demonstrate effects of very 
loud sounds on certain kinds of com- 
plex behavior particularly “clock- 
watching” and time estimation with 
the possibility implied that the dec- 
rement in performance may depend 
on the level of the sound as well as 
on its intermittency. 

With regard to the effects of high 
intensity or intermittent sound on 
feeling, the literature is almost unan- 
imous: high intensity noise, even 
when it may have no effect on per- 


formance will generally produce 
symptoms of discomfort, irritability, 
and distraction, although there js 
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little known about the relative an- 
noyance value of different kinds and 
levels of intermittency. 

Certain unique subjective charac- 
teristics of repetitive sounds are de- 
scribed relating to the effects of vari- 
ous rates of repetition on fusion, es- 
timation of pulse frequency, tonal 
character, threshold, and loudness. 
The greatest effects are usually 
obtained at repetition rates between 
5 and 10 pulses per second, a fre- 
quency range which coincides more 
or less closely with the alpha rhythm 
of the brain. 

With regard to the effects of loud 
or intermittent sounds on physiology, 
changes in blood pressure, gastric 
secretion, pulse rate, palmar sweat- 
ing, respiration, muscle tension, the 
electro-encephalogram, and blood- 
oxygen saturation have been 
ported in various studies, 

Some theoretical concepts are pre- 
sented which postulate effects of 
auditory intermittency parallel to 
those of visual flicker. In general, the 
need for more research on the effects 
of intermittent sounds of various fre- 
quencies, repetition rates, and in- 
tensities is evident from this review. 


re- 
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METHODOLOGICAL CONSIDERATIONS IN THE CONSTRUCT 
VALIDATION OF DRIVE-ORIENTED SCALES 
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University of Arkansas 


Studies concerned with the dy- 
namogenic effects of anxiety, as 
measured by the Taylor Manifest 
Anxiety Scale (MAS), on learning 
and performance (e.g., Spence & 
Farber, 1953; Spence, Farber, & Mc- 
Fann, 1956; Taylor & Chapman, 
1955; Taylor & Spence, 1952) have 
typically regarded the MAS as a dis- 
criminating measure of generalized 
drive (D). Evidence for the validity of 
the test is considered present when- 
ever extreme groups, assigned on the 
basis of test score, perform in the 
direction hypothesized by related D 
theory. The particular theory in- 
volved has been treated in detail by 
Farber (1955), Taylor (1956), and 

Spence (1958). 


CONSTRUCT VALIDITY 
AND THE MAS 


Experimental validation of a test 
within a theoretical framework rep- 
resents an attempt at construct vali- 
dation. Basic to construct validation, 
as conceived by Cronbach and Meehl 
(1955), is the existence of a nomo- 
logical network that relates the con- 
struct to observables and to other 
constructs. Extending the validity of 
a construct would involve, according 
to Cronbach and Meeh| (1955), 
“.. , elaborating the nomological 
network in which it occurs, or of in- 
creasing the definiteness of the com- 
ponents” (p. 290). They further 
stated that the validation of a test 
claiming to measure a construct 

would require the existence of a 
nomological net surrounding that 
construct. The recent article by 
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Spence (1958) attempted, in fact, to 
describe the nomological net sur- 
rounding the construct of emotion- 
ally based D. 

In their criticism of the construct 
validity of the MAS, Jessor and 
Hammond (1957) questioned the ex- 
tensiveness to which various aspects 
of the nomological net surrounding D 
have been investigated experimen- 
tally and alternative inferences dis- 
confirmed. They accurately noted 
that MAS studies have concentrated 
specifically on the energizing com- 
ponent of D and have generally ig- 
nored other, perhaps equally impor- 
tant, aspects of the net. The Jessor 
and Hammond criticism is certainly 
germane not only to the MAS but to 
other scales purporting to measure D. 

The present writers, moreover, are 
inclined to the critical view that even 
those studies dealing with the dy- 
namogenic aspects of the net en- 
compassing D have, in general, con- 
tributed little to a systematic con- 
struct validation of the MAS and 
other D-oriented scales, The litera- 
ture in the area fairly abounds with 
studies yielding conflicting or equiv- 
ocal results, Experiments on serial 
and _paired-associates learning, in 
Particular, have been conspicuously 
contradictory in their findings. On 
one hand, Spence, Farber, and Mc- 
Fann (1956), Spence, Taylor and 
Ketchel (1956), Taylor and Chap- 
man (1955), Taylor and Spence 
(1955), and others (Montague, 1953; 
Ramond, 1953) have reported evi- 
dence consonant with the nomolo- 
Sies present in the D net; on the 
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other hand, Saltz and Hoehn (1957), 
Katchmar, Ross, and Andrews 
(1958), Sarason (1957a, and 1957b). 
Hughes, Sprague, and Bendig (1954), 
and others have provided evidence 
against the expectations of the theory. 
The present writers contend that a 
major difficulty hampering and con- 
fusing research in this area has been 
the tendency to confound interac- 
tional effects within the nomological 
net with certain methodological prob- 
lems inherent in the research design. 
The express purpose of this paper is 
to point out some of these pertinent 
methodological problems and to sug- 
gest some possible ways of resolving 
them. 


METHODOLOGICAL PROBLEMS IN 
THE NATURE OF THE TASK 


According to the proponents of D 
theory, both D level and the relative 
strengths of competing responses 
must be accounted for in predicting 
the effect of D upon performance. 
Taylor (1956), for example, explicit- 
ly stated the position as follows: “In 
situations in which a number of 
competing response tendencies are 
evoked, only one of which is correct, 
the relative performance of high 
and low drive groups will depend 
upon the number and comparative 
strengths of the various response tend- 
encies” (p. 304). In simple learning 
situations such as conditioning where 
a single response tendency is to be 
acquired, the prediction from D 
theory is straightforward. High D 
groups are predicted to condition at 
a faster rate than low D groups: 
Considerable experimental evidence 
would seem to support this predic- 
tion based upon the MAS. In verbal 
learning, and especially paired-asso- 
ciates learning, the prediction would 
a based upon the amount of inter- 
erence or competition within the list. 


D theory would predict that with lit- 
tle intratask interference a high D 
group should be superior to a low D 
group; however, on tasks involving 
considerable intratask competition, 
the low D group would be predicted 
to be superior. The validity of these 
nomologies within the D net, inci- 
dentally, has been questioned by 
Hill (1957) on purely theoretical 
grounds. 

Because of the D level-response 
competition interaction that has been 
postulated to operate within the 
nomological net, research relating D 
level as measured by D-oriented 
scales (chiefly the MAS) to per- 
formance on paired-associates tasks 
requires a “control” over the degree 
of competition within the task. The 
empirical results of such studies 
would then represent evidence for a 
construct validation of the scale. A 
crucial condition, therefore, in these 
validity studies is the degree to which 
E has controlled intratask interfer- 
ence. The typical procedure is to se- 
lect nonsense syllables or adjectives 
in such a way that the similarity of 
the material and the association val- 
ue of the material within the list can 
be conveniently manipulated. Simi- 
larity is usually manipulated by 
varying the letter content of items 
within the list, and association value 
is regulated by selecting syllables 
from Glaze’s (1928) or Hull's (1933) 
tables with previously calibrated as- 
sociation values. By such manipu- 
lations, E derives a list which he con- 
siders representative of either a com- 
petitional or noncompetitional task. 
‘As Saltz and Hoehn (1957) point out, 
this procedure confounds amount of 
intratask competition with the diffi- 
culty of the task. They contend that 
an increase in response competition is 
accompanied by an increase in di 
culty of the list. They conducted sev- 
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eral studies in which they attempted 
to partial out difficulty from compe- 
tition. As a result, their findings 
failed to support D level-response 
competition theory. 
In manipulating lists of nonsense 
syllables or adjectives in terms of 
similarity and association value for 
purposes of varying intratask inter- 
ference and/or controlling for task 
difficulty, E seems to be operating 
under the basic premise that compe- 
tition within a list is independent of 
D as measured by his D-oriented 
scale. We seriously question this 
premise particularly as it applies to 
the association value of nonsense 
syllables and adjectives. The cali- 
brations of Glaze and Hull, it may 
be recalled, were based upon groups 
undifferentiated on any D dimen- 
sion. It does not seem unreasonable 
to suspect that performance on D 
scales and number of associations to 
nonsense syllables may have some 
covariance. If true, lists constructed 
from such calibrations would not be 
comparable lists in terms of compe- 
tition and difficulty for high and low 
D groups selected on the basis of 
scale score. Some adequate empirical 
demonstrations of independence be- 


tween these variables are 


clearly 
needed. 


METHODOLOGICAL PROBLEMS IN 
Drive MEASUREMENT 


Another important methodological 
problem encountered in construct 
validation studies on D-oriented 
scales pertains to the conditional defi- 
nition of D which the use of such 
scales involves. These difficulties 
have been discussed by Jessor and 
Hammond (1957). They concluded 
with the statement: “When a con- 
struct implies a relationship between 
variables, these variables must be 
designated independently of any test 


of that relationship” (p. 169). The 
methodology commonly employed in 
studies involving D scales and com- 
plex learning tasks has departed 
grossly from this important qualifi- 
cation. The scale has been employed 
both to establish the validity of the 
construct (D) and simultaneously to 
establish the construct validity of 
the scale. Under such confounding 
dual purposes, failure of the data to 
fulfill the predicted outcome cannot 
be taken as substantive evidence for 
either an absence of construct valid- 
ity in the scale or for an incorrect 
nomological net. The results of such 
studies serve mainly to confuse and 
cloud the issue. 

As a way out of this protracted 
dilemma, the writers suggest that re- 
search in this area begin to utilize ex- 
perimentally induced D states as 
controls for evaluating the effects of 
response-inferred D states. We fur- 
ther recommend positive, nonemo- 
tional, approach drives in preference 
to negative, emotional, avoidance 
drives, such as shock or threat of 
shock, which are fraught with so 
many unsettled theoretical problems 
in their own right. The nomological 
net surrounding the positive drives is 
more clearly defined and has received 
wider empirical support than is gen- 
erally the case for the negatively 
based drives. 

Since experimentally induced D 
would be considered as contributing 
to generalized D, predictions based 
on this form of D should be compar- 
able to those based on D inferred 
from performance on a scale. For ex- 
ample, groups performing under high 
and low incentive conditions should 
be at least partially equated with 
high and low D groups selected on 
the basis of MAS scores. Predictions 
of this nature would extend beyond 
mere rate of acquisition in complex 
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learning and would include such di- 
verse phenomena as intentional ver- 
sus incidental learning, positive and 
negative transfer, extinction, etc. 
The kind of controls advocated here 
would not, of course, eliminate from 
consideration other interpretations 
pertaining to the construct validity of 
D scales. Predictions concerning 
pe conelly based D upon an inter- 
ering response theory, as exempli- 
fied by Sarason and Mandler (Man- 
dler & Sarason, 1952; Sarason, Man- 
ler, & Craighill, 1952), or upon a 
habit theory, as exemplified by Hil- 
gard (1953) and Child (1954), would 
be difficult to test discriminately by 
any method. The procedure pro- 
pounded here has as its basic merit 
the elements for establishing a more 
well-defined baseline in an arca be- 
Sieged with many intricacies. 
l Two alternatives are suggested 
here for providing the type of control 
discussed above. The first consists of 
a replication of a carefully designed, 
well-conceived experiment in which 
an experimentally induced positive 
D had been manipulated and clear- 
cut results obtained. The replication 
would consist of repeating the task 
and procedure as closely as possible 
with groups selected as high and low 
scorers, respectively, on the D scale 
to be validated. Such replication 
would require a preliminary demon- 
Stration that competition within the 
material to be learned showed no co- 
variance with score on the scale. 
Study in preparation by the writers 
egep ines this approach. It con- 
ah of a replication of a study by 
E rick (1954) in terms of material 
eee and procedure followed, but 
sion extended to a different D dimen- 
ee: menace had demonstrated that 
ee incentive (financial reward) 
in pon displayed significantly greater 
ional learning but significantly 


less incidental learning than a low in- 
centive (no financial reward) group. 
The learning task consisted of the 
serial learning of geometric forms. 
Since the nature of the task is one 
that probably involves little intra- 
task competition, it should differ- 
entiate high and low D groups se- 
lected by MAS score in a similar 
manner. That is, D theory would 
predict that high scorers on the 
MAS should show greater intentional 
learning but less incidental learning 
than low scorers. 

We found, as in the Bahrick study, 
that the high D group (now identi- 
fied by MAS score), comparable inin- 
telligence and sex distribution to the 
low D group, displayed a significantly 
higher rate of intentional learning. 
Unlike the Bahrick study, however, 
the low D group did not display su- 
perior incidental learning. The re- 
sults would therefore seem to conflict 
with the finding by Silverman and 
Blitz (1956) that high scorers on the 
MAS showed significantly less inci- 
dental learning on a serial list of non- 
sense syllables with low association 
values than low scorers. Silverman 
and Blitz interpreted their findings 
in terms of interfering responses Cor- 
relating with anxiety. Unfortunate- 
ly, the lack of information concerning 
the comparability of the list of syl- 
lables for the two groups and the lack 
of control groups performing on the 
same task under more operationally 
defined D conditions make it difficult 
to evaluate such results as either 
supporting or rejecting the construct 
validity of the MAS. 

The second sug 
follows in a natura 
first. Studies that a 
the construct validi 
should include within their research 
design, whenever possible, 


rable groups performing thesame task 
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as the experimental groups but under 
varying levels of an experimentally 
induced approach D. The perform- 
ance of such control groups would 
then provide a more adequate basis 
for evaluating the effects of D in- 
ferred from scores on a D scale. A 
recent study by Katchmar, Ross, and 
Andrews (1958) illustrates this point 
nicely. They compared rate of learn- 
ing for a coding task between groups 
differentiated in D level in terms of 
MAS anxiety, ego involvement, and 
failure-induced stress. Their design 
is certainly excellent as far as making 
relative comparisons between these 
three variations of emotionally based 
D. In their discussion, however, they 
state the results for high and low 
MAS scorers do not support the theo- 
retical formulation of Taylor and 
Spence. Thus they are inferring that 
the study does not support the con- 
struct validity of the MAS, In our 
opinion, a more adequate test of the 
construct validity of the MAS in 
their study would require two addi- 
tional control groups, consisting of 
more clearly defined high and low D 
groups, where D js manipulated ex- 
perimentally in a nonemoti 


SUMMARY 


Experimental studies directed at 
establishing the construct validity of 
D-oriented scales, such as the Taylor 
MAS, are beset with theoretical and 
methodological problems that make 
it difficult to interpret their results. 
This is particularly true in studies 
that relate response-inferred, emo- 
tionally based D to verbal learning. 
As a partial answer to these prob- 
lems, the writers contend that studies 
employing verbal tasks should re- 
quire a prior demonstration of com- 
parability of the task for extreme 
groups identified by D scale perform- 
ance. This need is dictated by the 
emphasis placed on the interaction 
between D level and intratask com- 
petition in contemporary D theory. 
The writers further contend that re- 
search in this area requires informa- 
tion collected from control groups 
performing on the same task as the 
D scale groups under high and low D 
conditions that represent clearly de- 
fined, experimentally induced moti- 
vational states. The information 
thus provided would serve as a base- 
line for evaluating the evidence for or 


à non onal, posi- against the construct validity of the 
tive, approach direction, D-oriented scale. 
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ON THE CLASSIFICATION OF PROJECTIVE TECHNIQUES! 
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Classification is a menial task! Itis 
generally considered to be tedious, 
unexciting, and an activity, at least 
for American psychologists, to be 
delegated to someone else. In spite 
of this deficiency in allure, and the 
general tendency of most psycholo- 
gists to prefer the immediately more 
rewarding activities of experimental 
analysis or broad theoretical formula- 
tion, there seems little doubt that 
some emphasis upon classification is 
important in every branch of psy- 
chology. The concern of the present 
paper is with a relatively humble 
problem of taxonomy—the classifica- 
tion of projective techniques. 

I shall summarize briefly a num- 
ber of approaches to this problem 
and suggest a basis for classification 
that seems to me superior to the vari- 
ous alternatives. Such an enterprise 
is obviously of interest to those who 
teach or theorize about projective 
techniques, for some kind of order 
must be imposed upon this diverse 
array of instruments if they are to be 
discussed efficiently and intelligently. 
Moreover, a classification that can 
be agreed upon, and that seems to 
make psychological sense, should 
serve some function in research and 


1 This paper is an outgrowth of a mono- 
graph on the use of projective techniques in 
social research initiated by the former Com- 


mittee on Social Behavior of the Social Science É 
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was supported by Research Grant M-1949 
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Health Service. I am grateful to my colleague 
Ephraim Rosen for his suggestions and to 
Arthur Hill for assistance in preparing the 
tabular material. 
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applied settings, where the interrela- 
tion between various projective tech- 
niques frequently appears to be a 
matter of vast importance. 

Before turning to a definite pro- 
posal in regard to classification, let 
us consider some of the suggestions 
that have been made in the past. A 
number of prior observers have con- 
cerned themselves with the ways in 
which projective techniques can be 
clustered or grouped, but probably 
the first and best known of such at- 
tempts is contained in the pioneer 
article by Frank (1939), recommend- 
ing the use of projective techniques. 
He suggested that these instruments 
could be distinguished in terms of 
whether the responses they elicited 
were constitutive, interpretative, ca- 
thartic, or constructive. The test 
may be considered constitutive if the 
Sis required to provide a structure or 
form for relatively unstructured or 
ambiguous stimuli, such as finger 
Paints or Rorschach cards. When 
the S is asked to indicate what the 
meaning of the stimulus is to him, for 
example, if he is asked to assign 
meaning to a picture, the test is con- 
sidered to be interpretative. The 
cathartic test involves some delib- 
erate attempt to induce the S to ex- 
Press or release emotion in the process 
of reacting to the stimuli, as in the 
case of doll play or psychodrama. If 
the S is required to build or organize 
stimulus materials, such as blocks or 
toys, in such a manner as to reveal 

some of the organizing conceptions 
of his life” (p. 403) the test is labelled 
constructive. In a subsequent formu- 
lation Frank (1948) has added the 
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category , of refractive techniques. 
These are devices that depend upon 
error or distortion in the S's judg- 
ment of some set of stimuli, and are 
typified by instruments involving 
tachistoscopic presentation of stimuli 
related to a given motive or conflict. 

Helen Sargent (1945) has consid- 
ered the problem of classifying pro- 
jective techniques somewhat more 
broadly, suggesting that these de- 
vices may be grouped in terms of: 
(a) the nature of the materials pre- 
sented to the Ss; (b) the functional 
use the S makes of the materials; (c) 
the method of presentation used by 
the examiner; and (d) the purpose 
for which the test is employed. She 
suggests that under the heading of 
materials, the tests may be distin- 
guished in terms of whether they in- 
volve presentation of ink blots, pic- 
tures, stories, art media or sounds. 
The categories proposed under the 
heading of functional uses employed 
by the S are the same as those ini- 
tially recommended by Frank—con- 
stitutive, interpretative, cathartic, 
and constructive. In distinguishing 
between tests on the basis of method 
or technique of presentation, Sargent 
considers variations in both presenta- 
tion and interpretation. The main 
distinctions in presentation she dis- 
cusses have to do with the degree of 
standardization, Or “experimenta 
control,” that is imposed upon the 
examiner and S. Differences in in- 
terpretive approach that she identi- 
fies have to do with an emphasis upon 
empirical origin, as opposed to the- 
oretical derivation, of the interpreta- 
tive system. She suggests distin- 
guishing the tests in purpose accord- 
ing to whether they are used princi- 
pally for diagnosis, therapy, or ex- 
periment. 

In a discussion of th 
measurement of attitudes, 


e indirect 
Campbell 


(1950) suggests three classificatory 
principles that have implications for 
projective techniques. First of all, 
there is the question of whether the 
device is disguised or not, that is, 
whether the S can estimate accu- 
rately the intent of the examiner. This 
dimension might appear to have no 
utility in the present context, as vir- 
tually all projective techniques are 
assumed to be disguised, but there is 
actually a moderate degree of varia- 
tion among projective techniques 
along this dimension. Second is the 
question of whether the instrument 
is structured or not. Campbell ap- 
pears to use this term to refer to both 
the ambiguity of the stimulus and 
the amount of f reedom permitted the 
Sin determining how he will respond. 
Again these are qualities that are in- 
volved in the differentiation of pro- 
jective techniques from other per- 
sonality devices. Nevertheless, there 
is considerable variation between 
projective techniques along these 
dimensions and they might well pro- 
vide the basis for important classifi- 
catory distinctions. Third is the dis- 
tinction between “poluntary self-de- 
scription as opposed to diagnosis based 
upon differential performance in an 
objective task” (p. 15). Virtually no 
projective technique can be consid- 
ered to depend upon “voluntary self 
description” so that this dimension is 
of only passing interest in its present 
form. 

Recently, Campbell (1957) has 
presented a revision of this analysis 
which is intended to refer more Spe- 
cifically to projective techniques. 
The new formulation includes three 
polar dimensions: voluntary versus ob- 
jective (Is the S to report something 
accurately or is he to provide his 
uown” or “first” response, without 
regard for correctness?) ; indirect ver- 
sus direct (Does the S know the pur- 
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pose of the test?); and free-response 
versus structured (Can the S respond 
much as he chooses or must he select 
from a limited array of alternatives?). 
These dimensions are then combined 
to describe various kinds of psycho- 
logical tests and Campbell considers 
examples of the resultant types. He 
concludes that most projective tech- 
niques are voluntary, indirect, and 
free-response but some projective 
techniques can be described as volun- 
tary, indirect, and structured; volun- 
tary, direct, and free-response; objec- 
tive, indirect, and free-response; and 
objective, indirect, and structured. 
[ Cattell (1951) has suggested that 
the fundamental Process involved in 
projective tests is not projection but 
misperception and that these devices 
should, consequently, be called ‘‘mis- 
perception techniques.” Further- 
more, he has indicated that such de- 
vices may be divided into four differ- 
ent classes depending upon the form 
of misperception that operates. First, 
there is the instrument that depends 
on naive misperception, where the Sis 
unable to recognize the fact that 
others feel and think differently than 
he does and, as a result, generalizes 
his own perceptions to everyone else, 
Second, there is the test that utilizes 
the Process of autism, where the 5 
modifies or distorts his perception in 
such a manner as to satisfy or reduce 
his needs and desires, Third are the 
instruments that involve Press com- 
patibility misperception, where the S 
views the environment as existing in 
such a form as to fit, or make reason- 
able, his motives and affective states, 
Fourth are the devices that depend 
upon ego defense misperception where 
the distortion in perception takes 
place at the service of unconscious 
and repressed motives, in a form de- 
termined by the various mechanisms 
of defense.) 
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There are many additional distinc- 
tions between projective techniques 
that can be proposed. For example, 
we may point to the differences be- 
tween structural or formal techniques, 
as opposed to “meaning” or content 
techniques. Here the distinction has 
to do with whether, in interpreting 
the test, the focus of the examiner is 
upon the way in which the task is 
performed—the speed and quantity 
of response, the relative frequency of 
certain types of words, the tendency 
to respond to all or to part of the 
stimulus, etc.—or upon the meaning- 
ful outcome of the performance. The 
formal device is concerned with cer- 
tain quantifiable aspects of the re- 
spondent’s general pattern of re- 
sponse, and there is little or no in- 
terest in the content or meaning of 
what the respondent is saying or do- 
ing. The Rorschach technique is 
usually considered to be primarily a 
formal test, although there is consid- 
erable evidence for a shift in recent 
years toward more extensive use of 
content in interpretation. If the in- 
terpretation is focussed upon what the 
individual says or does and its mean- 
Ing, or the thematic connection be- 
tween various response elements, the 
Instrument would be classified as a 
content technique. Illustrative of 
this type of instrument is the cus- 
tomary use of the Thematic Apper- 
ception Test, 

Further, we might distinguish be- 
tween those tests that are admin- 
tstered individually, as opposed to 
those that are capable of group ad- 
ministration, Actually, this is a diffi- 
cult distinction to maintain, for as 
Seon as someone develops an indi- 
vidual technique that seems to pos- 
Sess utility, there are certain to be a 
number of investigators eagerly seek- 
ing to adapt the technique for group 
administration. Nevertheless, at any 
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given point in time it is possible to 
distinguish between tests in terms of 
how readily they can be adapted to 
meet the demands of group admin- 
istration. For example, the sentence 
completion test can be given in 
group settings very readily, while 
doll play or word association tech- 
niques are considerably more difficult 
to administer outside of the indi- 
vidual session. 

„One might classify projective de- 
vices in terms of the sense modality 
involved. For example, there are 
visual, auditory, and even tactual 
stimuli employed in tests now in use. 
An additional distinction between 
these devices can be made in terms 
of the degree of response multiplicity 
permitted by the technique. There 
are a few techniques that require the 
S to choose between a small num- 
ber of specified alternatives, for ex- 
ample, the Szondi Test; while others 
permit a theoretically (and almost 
practically) limitless number of re- 
sponses, as in the TAT. 

Similar to one of the distinctions 
proposed by Sargent, is the difference 
between rational and empirical tests. 
On the one hand, we have techniques 
where there is no rationale provided 
for the fact that a given type of re- 
sponse seems to be associated with a 
given personality characteristic. Nor 
is the individual who develops such 
an instrument concerned with this 
state of affairs. As long as there is a 
firm association between a particular 
type of test response and a given per- 
sonality variable, he believes that the 
test may be used in a dependable and 
useful fashion. The extreme of this 
approach implies only an interest in 
the empirical regularity, with no con- 
cern for underlying processes Or in- 
termediary factors. On the other 
hand, we have techniques where 
there is a reasonably careful attempt 
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to provide a prior rational or theoret- 
ical basis to justify the use of a par- 
ticular response element as diag- 
nostic of a particular personality at- 
tribute. In practice, it is clear that all 
devices represent mixtures of these 
two extremes. The individual who 
professes to be disinterested in theory 
and concerned solely with empirical 
association must make some decision 
about where to look for his empirical 
regularity, and here he obviously 
must drag in some “theory” or prior 
assumptions. On the other hand, the 
individual who is interested in a prior 
rationale, if he is sophisticated, must 
show considerable curiosity about 
whether his theoretically predicted 
relationship is, in fact, sustained in 
the world of reality and, thus, he in- 
troduces the crass empirical criterion. 
In spite of this overlap, many instru- 
ments appear to be more heavily in- 
fluenced by prior theorizing than 
others. In general, the Blacky Pic- 
tures have developed with a close 
relationship to explicit theory, while 
the Rorschach, during at least much 
of its development, seems to have 
been treated as an empirical device. 

The potential systems of classifica- 
tion we have considered are by no 
means exhaustive but they serve to 
illustrate the rich variety that offers 
itself to the person who surveys this 
area. How to choose between all of 
these alternatives? Perhaps an 
answer to this question can be pro- 
vided by attempting to classify the 
classifications. That is, if all the prin- 
ciples of classification can be groupe 
together, it may be possible to select 
from among them the avenue that 
seems most fruitful. 

From what has already been said, 
\it is clearly possible to distinguish be- 
tween six different approaches to the 
classification of projective tech- 
niques. First, there is the distinction 
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based upon attributes that inhere in 
the test material itself. Here we are 
concerned with variation in the 
stimulus material, for example, struc- 
tured versus unstructured or audi- 
tory versus visual. Second, we may 
classify the tests in terms of the 
method by which the technique was de- 
vised or constructed, for example, the 
distinction between rational and em- 
pirical techniques. Third, we may 
distinguish between these devices on 
the basis of the manner in which the 
test is interpreted, for example, formal 
analysis versus content analysis, or 
“sign” interpretation versus holistic 
interpretation. _ Fourth, we might 
propose a classification that is based 
upon the purpose of the test, for ex- 
ample, the assessment of conflict as 
opposed to the measurement of mo- 
tives, or the general description of 
personality as opposed to the estima- 
tion of specified dimensions of per- 
sonality. Fifth, we might propose a 
set of categories that are concerned 
with differences in the administration 
of the test, for example, group tech- 
nique as opposed to individual tech- 
nique, or self-administered versus ex- 
aminer-administered. Sixth, we can 
distinguish between the instruments 
on the basis of the type of response 
they elicit from the S, for example, 
story construction as opposed to as- 
sociation. 

All of these distinctions have some 
usefulness and something can be said 
in favor of each of them as Providing 
the best means for classifying projec- 
tive techniques. In spite of this, I 
would argue that the final type of 
classification, the one based upon 
differences in type of response, is 
easily the most important and the 
one that merits emphasis. The essen- 
tial consideration here is that this 
classification seems most likely to be 
closely related to the underlying psy- 


chological processes involved in the 
various tests, for it is this classifica- 
tion that points to what the S is ac- 
tually doing. In so far as these tests 
are distinctive, and to be treated as 
significantly different, it seems likely 
that the major determinant of this 
distinctiveness will be the differences 
in what the S is actually engaged in 
as he responds. It is also worth note 
that a number of the other types of 
classification are more or less directly 
specified by distinctions based upon 
mode of response, for example, if the 
technique elicits choice responses, we 
know a good deal about whether it 
will emphasize formal or content 
analysis, whether it is likely to be 
capable of group administration, and 
whether it will be structured or not. 

Even if we agree that distinctions 
between projective techniques based 
upon variation in the type of response 
elicited from the S are most impor- 
tant, there is still the task of arriving 
at just the proper array of such dis- 
tinctions. For most purposes it 
seems sufficient to think in terms of 
five general types of response. These 
are: (a) association, (b) construction, 
(c) completion, (d) choice or order- 
ing, and (e) expression. Obviously, 
not every test can be fitted neatly 
into only one of these categories, 
There is the usual overlap and am- 
biguity in the world of 
ever, with very little effort it is possi- 
ble to classify 
tive technique 
inantly one o 
sponses. More significant is the fact 
that when proj 
classified on th 
the instrumen 
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The reader will note certain sim- 
ilarities between the present cate- 
gories and those proposed by Frank 
(1948). However, Frank’s categories 
do not consistently refer to the na- 
ture of the S’s response, for example, 
the refractive distinction points to 
the interpretive process; and, in some 
cases, the distinction implied by his 
labels does not seem empirically 
clear, for example, the distinction be- 
tween interpretive, constitutive, and 
constructive is by no means evident. 
Most important is the fact that his 
categories do not produce the same 
clusters of instruments that the pres- 
ent classification generates. 

Let us characterize very briefly 
each of these types of projective 
techniques and indicate, in an illus- 
trative manner, the individual tests 
that would be included under each 
heading. First of all, are the associa- 
tive. techniques. Here the S is set to 
respond to some stimulus presented 
by the examiner with the first word, 
image or percept that occurs to him. 
Such devices minimize ideation and 
emphasize immediacy. The S is not 
to reflect or reason but, rather, to 
respond. with whatever concept or 
word, however unreasonable, first 
rises to consciousness, or occurs to 
him.; 

These techniques, in certain re- 
spects, represent a bridge between ex- 
periment and the clinical setting, for 
in both areas there have been exten- 
sive studies of what happens when an 
individual is asked to respond to some 
stimulus with the first association 
that comes to his mind. It was nat- 
ural that students of the normal, 
conscious, human -mind should use 
this device as a means of mapping, 
or laying bare, the structure of 
mental events. Further, once Freud 
had devised the method of free asso- 
ciation ‘this approach was accepted 
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as an important means of gaining 
insight into the hidden reaches of the 
mind. It is not surprising, therefore, 
that a number of important tech- 
niques embodying this mode of re- 
sponse have been developed, the most 
popular of-which are the Word Asso- 
ciation Test and the Rorschach test. 


‘Also typical of this type of instru- 


ment are Stern’s Cloud Test and cer- 
tain auditory projective tests. 

Second are the construction tech- 
niques. Here we find a group of in- 
struments that require the S to create 
or construct a product which is typi- 
cally an art form such as a story or 
picture. There is a minimum of re- 
striction placed upon the S's re- 
sponses and in some cases, such as the 
blank card of the Thematic Apper- 
ception Test, even the original stim- 
ulus is under little control by the ex- 
aminer. 

The focus of this type of instru- 
ment is upon the outcome, or prod- 
uct, constructed bythe S and not 
upon his behavior or style in the 
process of. creating or responding. 
The Sis set to provide a product that 
is meaningful and personally relevant 
to the eliciting stimuli. The response 
process may begin with simple asso- 
ciation, but the requirements of these 
tests force the S to modify and elab- 
orate the original association, so as to 
satisfy normative requirements for 
what constitutes a story or other art 
form. Unlike the associative tech- 
niques, these instruments require the 
S to engage in complex, cognitive ac- 
tivities beyond mere association. Il- 
lustrative of these devices are the 
Thematic Apperception Test, the 
Blacky Pictures, and the Make-A- 
Picture-Story Test. 

Third, we find the completion tech- 
niques. These measures provide the 
S with some type of incomplete prod- 
uct and the requirement that he com- 
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plete it in any manner he wishes. 
They differ from the associative tech- 
niques in that both the stimulus and 
the response are typically much more 
complex and thus the response is less 
immediate. Furthermore, the com- 
pleted product is usually expected to 
meet certain external standards of 
good form or rationality, e.g., there 
are rules about what constitutes a 
sentence or a story and they presum- 
ably operate to determine the S’s 
completions. When compared to the 
construction techniques, the re- 
sponses elicited by these instruments 
are generally simpler and more re- 
stricted. The best known example 
of this type of instrument is the 
Sentence Completion Test, but 
equally typical are the Picture Frus- 
tration Study and argument comple- 
tion and story completion tech- 
niques. 

Fourth are choice or ordering tech- 
niques. These instruments resemble 
the associative measures in the sim- 
plicity of the response set provided 
forthe Ss, Here the respondent 
merely chooses from a number of 
alternatives the item Or arrangement 
that fits some specified criterion such 
as correctness, relevance, attractive- 
ness, Or repugnance. In some cases, 
such as the multiple choice Rorschach 
and TAT, these devices mirror other 
techniques except that the S is asked 
not to produce an association or a 
construction but rather to Select from 
a number of hypothetical] responses 
the one that seems most appropriate 
to him. The two tests that provide 
the most effective illustration of this 
category are the Szondi Test and the 
Picture Arrangement Test. 

Fifth are the expressive techniques, 
As a class, these methods represent q 
bridge between the diagnostic and 
therapeutic, for all of them play an 
active role in current therapeutic 


practice. It is presumed for these 
measures that the S not only reveals 
himself, but also that he expresses 
himself in such a manner as to influ- 
ence his personal economy or adjust- 
ment. Typically these instruments, 
as in the case of the constructive 
techniques, require the S to combine 
or incorporate stimuli into some kind 
of a novel production. Unlike the 
constructive techniques, however, 
there is as much emphasis upon the 
Manner or style in which the product 
is created, as upon the production it- 
self. In other words, the chief dis- 
tinction between these measures and 
constructive devices is the assump- 
tion of therapeutic efficacy, and the 
greater emphasis here upon the style 
or Manner in which the constructive 

Process is carried out. Typical of 
these instruments are play tech- 
niques and dra ving and painting 
techniques, as well as psychodrama 
and role playing devices. 

l So much for this simple classifica- 
tion. Itis evident that the person 
who wishes a more complex basis for 
differentiating projective techniques 
can readily introduce additional di- 
mensions. Thus, if we return to the 
Sıx types of classification mentioned 
earlier, we can easily construct a 
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tests according to their typical or 
most highly developed use, in order 
to maximize differentiation. There- 
fore, in rating ‘‘mode of response” the 
Rorschach is assigned a double plus 
(+-+) for the ‘‘associative’’ row, and 
no rating for the “choice or ordering” 
row, in spite of the fact that the test 
is sometimes given in group forms 
that involve choice or ordering. It 
must be admitted that not every one 
of the judgments registered in this 
table would meet with unanimous 
approval among other psychologists, 
but the majority of the decisions are 
quite evident and with a little further 
definition of terms, and specification 
of standards of judgment, would be 
made consistently by most trained 
psychologists. 

Thus, we are able to construct a 
profile for each projective test, repre- 
senting its classification according to 
a variety of criteria. Moreover, if we 
wish, we can readily compute coeffi- 

- cients of similarity or discrepancy to 
indicate the amount of association 
between the various instruments on 
these ratings. Illustrative of such an 
approach is Table 2 which presents a 
matrix of deviation coefficients (Os- 
good’s and Suci’s D index [1945]) 
that estimate the degree of associa- 
tion between the instruments as they 
were rated in Table 1. The reader 
should note that computation of 
these indices involved the minor sin 
of overlooking the fact that our rated 
dimensions are not orthogonal. 

A further step in the analysis is 
represented in Table 3 where we find 
illustrative clusters of techniques 
that seem to be similar to each other 
in their profiles. All of the tests in- 
cluded in the same cluster are linked 
by a D index that would place them 
in the lower quartile (most similar) 
of the 55 indices 


2. The clusters are presented in order 


presented in Table . 
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of decreasing internal homogeneity. 
How meaningful are these clusters, 
and how do they compare with the 
groupings based on our classification 
according to type of response? To 
begin with, it should be noted that 
three of the clusters (Sentence Com- 
pletion Test and P-F Study; TAT, 
MAPS, and Blacky Pictures; Szondi 
and Picture Arrangement Test) are 
identical with the clusters that would 
have been derived from the single cri- 
terion of “type of response.” Fur- 
thermore, it is clear that according 
to this matrix of indices the Word 
Association Test is the most indi- 
vidual of all the instruments, for its 
lowest D index is appreciably higher 
than those indices linking the tests 
that we have clustered together. 
Finally, it turns out that the Ror- 
schach test is singularly difficult to 
classify. Itis clustered with drawing 
(painting) techniques and psycho- 
drama on the basis of low D indices, 
but it also shows considerable sim- 
ilarity to the MAPS Test and the 
TAT, although not with the Blacky 
Pictures. All in all, the classification 
that emerges from this somewhat 
tedious and difficult method of anal- 
ysis bears a strong resemblance to 
the simpler classification we have pre- 
sented, and at the same time it repre- 


red outcome, 
at this point 
Co: e is any su- 
Periority to such an approach, 

this paper has 
pointed to the importance of estab- 
lishing some consistent basis for clas- 
sifying projective techniques, and has 
considered a number of possible ap- 
proaches to this problem. A classifi- 
cation based upon the mode of re- 
sponse elicited from the $ was identi- 
fied as most promising, and it was 
suggested that projective technique 
responses can be divided meaning- 
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g TABLE 3 
PROJECTIVE TECHNIQUES CLUSTERED . 
A 


Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 
Sentence-Comple- Thematic Apper- Rorschach Szondi Test Word Association 
tion Test ception Test , Test 
P-F Study Make-A-Picture- Drawing and Paint- 


Picture Arrange- eo 
Story Test ing Ti echniques ment Test 
Blacky Pictures Psychodrama 


fully according to whether they in- 


ój 

plex (multidimensional) taxonomy 
volve: association, construction, com- there seemed to be little basis for pre- 
pletion, choice (ordering), or expres- ferring the more cumbersome method 
sion. Moreover, when this classifica- of classification. 
tion was compared with a more com- 
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PSYCHOLOGICAL PROBLEMS RELEVANT TO 
BUSINESS AND INDUSTRY! 


MASON HAIRE 
University of California 


Many of us do not realize the rate 
of growth in the segment within psy- 
chology devoted to problems related 
to industry. In 1949, Dennis pointed 
out that there were only 105 Fellows 
of the APA in Division 14, and that 
this was less than one per million of 
national population. By 1958 there 
were 250 Fellows and 409 members. 
When this is made a fraction against 
the denominator of national popula- 
tion statistics, it is still a very small 
one. However, it is a tremendous rate 
of growth. By 1958 Division 14 was, 
in terms of Fellows, the fourth larg- 
est division with 7% of the total num- 


1 This paper is part of a larger plan of 
papers, written at the request and suggestion 
of the Ford Foundation Program in Eco- 
nomic Development and Administration. 
Companion papers by Paul Lazarsfeld and 
Robert Dahl appear in the American Journal 
of Sociology and The American Political Sci- 
ence Review, respectively. The aim of the 
three, covering psychology, sociology, and 
political science, is to indicate research areas 
in the social sciences related to problems of 
business and industry. The method will be to 
jllustrate research areas by describing con- 
ceptual development in a kind of “main cur- 
rents of thought” manner. The tremendous 
volume of publication in the field makes any- 
thing like an exhaustive review impossible; it 
is only possible to pick special cases, and these 
where they illustrate turning points in thinking 
or point to research possibilities in the future. 
Further, it is sometimes necessary tO make 
arbitrary limitations following traditional aca- 
demic book-keeping practice; for example, 
where industrial social psychology shades im- 
perceptibly into industrial sociology- 


ber of Fellows; in terms of rate of 
growth its increase of 238% since 
1948 is second only to the school psy- 
chologists’ rise of 339%. Moreover, 
Division 14 does not reflect much of 
the psychological work—in social 
psychology, communication, role the- 
ory, and the like—relevant to in- 
dustrial problems. If it were possible 
to take these into account, the 
growth and proportion would be 
even more striking. 

Dennis also pointed to the fact 
that in 1948 over half the Fellows of 
Division 14 were employed in col- 
leges and universities, suggesting 
that this indicated surprisingly little 
contact with industrial problems. 
Roughly the same kind of population 
figure holds today, but one might 
draw another implication from it. In 
spite of its adjectival title, suggestive 
of an “applied” character, industrial 
psychology is and always has been 
primarily an academic discipline. 
Rooted in psychological concepts in 
individual differences, motivation, 
social and experimental psychology, 
its inception and the course of its 
development have been primarily de- 
termined by the developing nature 
of psychological theory, rather than 
the pressures of exigencies within 
business and industry. The business 
of applying psychology to industry 
has been the application of psychol- 
ogy as it developed as psychology, 
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rather than a reshaping of the field 
by industrial pressures. A good text 
for a course in industrial psychology 
could well be a history of psychologi- 
cal problems as they appear in the in- 
dustrial setting rather than a history 
of industrial problems. To be sure, 
outside developments are evident in 
the history of industrial psychology. 
The development of the Civil Service 
gave impetus to studies of merit rat- 
ing and of selection. The creation of 
collective bargaining initiated a 
whole new set of psychological prob- 
lems (Haire, 1957a) and the com- 
plexities of war-time equipment 
spurred human engineering studies. 
More recently, the increase in size of 
organizations, the tight labor market 
which put emphasis on nonfinancial 
incentives, and the increasing ra- 
tionalization of jobs gave promi- 
nence to a set of social psychological 
problems. 

External pressures have had an in- 
fluence, but the development has 
been primarily within the subfield 
and within psychology. It is in these 
terms that this paper will try to draw 
the developing lines. The main de- 
terminants seem to be growths within 
the general body of psychological 
thinking in major areas such as mo- 
tivation, individual differences, and 
the like. Within this. framework 
there are a separate set of narrower, 
more specific subfields of industrial 
psychology itself which have a de- 
velopmental history of their own. Fi- 
nally, cutting across these are the 
historical developments outside the 
field which exert a special influence 
from time to time. 


THE SPECIAL AREAS OF 
INDUSTRIAL PsycHoLocy 


It seems useful to distinguish three 
traditions within the field. They 
have quite different conceptual bases, 
their developmental histories are not 


at all the same, and they have rela- 
tively little contact with one another. 
The first is the field of personnel psy- 
chology, flowing from the tradition 
of individual differences; the second 
is human engineering, growing out of 
applied experimental psychology; and 
the third is less compact, what might 
be called industrial social psychology, 
although it includes some individual 
problems of motivation and the like. 

The first two—personnel psychol- 
ogy and the human encineering ap- 
proach—stand sharply ‘apart. The 
approach from the side of individual 
differences, applied to industry, seeks 
to identify the best man for the job. 
It uses essentially a correlational 
method and viewpoint, and the at- 
tention is focused on within-group 
variance in sharp distinction to the 
human engineer whose main interest 
is in the between-group variance, or 
the effects of treatments. The aim of 
the engineer is not to find the right 
man for the job, but as Chapanis 
(1952) says, “to make the job fit the 
man—any man.” Cronbach puts the 
distinction well (1957) when he says 
the experimentalist attacks nature, 
seeking to modify the environment, 
while the correlationist approaches 
nature like a lover, taking her as she 
is and studying her as such. There 
is more than a deep conceptual dif- 
ference between them; they are dia- 
metrically opposed in practice, Every 
step of success the engineer has in 
making the job fit Everyman de- 
stroys part of the reason for being of 
the selection tester, who depends on 
the fact that not everyone can do the 
job equally well. In the reverse dij- 
rection, it is also true, but less com- 
pelling, that if the effectiveness of 
selection had been greater it would 
have left less room for the work of 
the human engineer, This is not quite 
so true as the obverse, since the de- 
sign of equipment would always be 
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J useful in special situations—where 

the task required skills which were in 
. scarce supply, in a tight labor mar- 
ket, or in a military situation where 
the characteristics of the labor force 
may be inflexible, and the demands 
of the equipment beyond much im- 
provement by classification. In a 
way it is strange that the liberal tra- 
dition of remaking the world to make 
it fit Man better, in contrast with the 
relatively passive caste-ridden Dar- 
winian approach of the correlationist, 
should come from something called 
Engineering Psychology. The social 
philosophy seems to be grounded less 
in ideological conviction, however, 
than in a response to the complexities 
and demands of equipment. 

The philosophy of the social the- 
orist is a little harder to put concisely 
than the other two, but he stands 
quite far apart from them. The his- 
torical background is more complex. 
It includes some sociological tradi- 
tions and a group of diverse psycho- 
logical fields. The classic Manage- 
ment and the Worker (Roethlisberger 
& Dickson, 1939) marked the clear 
emphasis on certain aspects of group 
structure and social motivation. 
Moreno added others, and Lewin and 
the group dynamicists still others. 
Some of the philosophy of the in- 
stinctivists has remained in the in- 
terest in motivation and attitude 
measurement. and the communica- 
tion theorists, in dealing with both 
structure and process, draw on a 
variety of fields. There is a phenom- 
enological trend apparent as the field 
of social perception is brought to 
bear on industrial problems. In some 
ways the social approach seems closer 
to that of the engineer than to the 
correlationist. Both the social psy- 
odes the human engineer are 
built ie ee the job may be re- 
AA imize the utilization of 

potential. The social theo- 
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rist, however, turns from the rela- 
tively simple sensory and motor 
problems of the human engineer to 
manipulate quite different variables. 
He will try to arrange work situa- 
tions to provide maximum motiva- 
tional satisfactions and to structure 
groups so that their strengths are not 
barriers but aids to the accomplish- 
ment of the organization’s produc- 
tive objectives, and the like. 

These, briefly, are the three sub- 
areas we will consider. Let us look at 
each of them in the light of their con- 
ceptual history and the research 
problems they have raised. 


HUMAN ENGINEERING 


Historically, the human engineer- 
ing approach is probably best traced 
to the early “conditions of work” 
interest. Here, notably, the very 
general environment was the center 
of attention in studies of fatigue, 
lighting, effects of music, and the 
like. The plant tended to be the unit 
of variation and the work group the 
unit of response. In their original 
design, the Hawthorne studies 
(Roethlisberger & Dickson, 1939) 
followed this pattern as late as the 
1930’s. Later, attention narrowed to 
a more nearly individual basis. Al- 
though it does not quite fit chrono- 
logically, the spirit of the time studies 
of Taylor and the motion studies of 
Gilbreth helped to narrow the inter- 
est to smaller processes. 

The present interest—for example, 
in equipment design—uses the indi- 
vidual as the unit of response anal- 
ysis and some intimate contact- 
apparatus as the unit of controlled 
variation. The work begins with 
simple questions like, “how long 
should the handle of the crank be?” 
and “how big should the type be?” 
and goes on to much subtler prob- 
lems of psychomotor control and the 
display of information. Methodo- 
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logically, a real advance was made by 
the emphasis on the identification of 
error variance associated respectively 
with man and machine in complex 
systems. 

The basic approach of the engineer- 
ing psychologist has been to ask, 
“what abilities does man have?”; 
“what does the end product re- 
quire?’’; and “how can the machine 
be built to go between the two?” It 
has some interesting pressures from 
implications for the social system 
outside of psychology. It fits in well 
with the general tendency to ra- 
tionalize jobs in the interest of min- 
imizing the cost of turnover. If the 
job could be built “to fit the man— 
any man,” it also made it easier for 
the industrial organization to absorb 
variations attendent on business cy- 
cles by appropriate variation in rela- 
tively homogeneous personnel units 
requiring a minimum of recruitment 
and training. As the development 
popularly called automation contin- 
ues—the control of machines by ma- 
chines—the picture changes. As 
Drucker (1955) points out, with the 
high cost of automation the area of 
business risk shifts. It becomes im- 
portant to stabilize production—pref- 
erably at maximum output, but im- 
portantly at a stable figure. The area 
of psychological research interest 
tends to shift from the producer to 
the consumer. Ideally, a product de- 
signed to elicit high motivation to 
purchase, with rapid obsolescence 
and a fast waning of interest in the 
item coupled with an equally fast rise 
in the desire for another one like it is 
the psychological insight demanded 
to make automation work economi- 
cally. 

The rationalization of jobs to fit 
Everyman has had implications in 
both the other subareas of industrial 
psychology. On the one hand, to the 
extent to which individual differences 
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are minimized, the individual’s feel- 
ing of his peculiar contribution is re- 
duced, and this diminution in ego- 
istic need-satisfactions has formed a 
fair part of the interest of the indus- 
trial social psychologist. On the 
other hand, the rationalization of 
jobs for Everyman is a tremendous 
waste of existing individual differ- 
ences, though the correlationisis have 
not made an issue of it. Deskilling 
benefits companies taken singly, but, 
to maximize national productivity, 
it would seem wiser to utilize the 
peaks of individual abilities rather 
than the mid-height of the mean. By 
and large, also, though it is not a 
necessary implication, the changes 
resulting from human engineering 
have not maximally utilized the po- 
tential effects of training. 

The human engineer has also be- 
gun to challenge the basis of the tra- 
ditional industrial engineer. In a 
sense, the engineer first invaded the 
field of the psychologist by recom- 
mending certain behavioral proce- 
dures on the basis of the findings of 
time and motion studies. This ap- 
proach, however, traditionally treats 
the machine as a constant and man 
as a variable, exploring the optimal 
movements most suitable to the con- 
tinued operation of a particular ma- 
chine. Except for studies of plant 
layout and work-bench layout, the 
engineer has typically sought to 
change the worker. Engineering psy- 
chology, since it is a planning of the 
machine on the basis of the operator's 
characteristics, was bound to chal- 
lenge this approach. A beginning of 
such a challenge can be seen, for ex- 
ample, in J. S. Brown and W. O. 
Jenkins’ paper (1947) in which they 
propose a reclassification of motor re- 
sponses into static reactions, posi- 
tioning reactions, and movement re- 
actions. These are not divided in 
terms of observed performance on a 
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job, like, for example, the ‘‘carry 
empty” “carry loaded” categories of 
the Therblig notation, but rather are 
based on hypothesized characteristics 
of the operator, and, as such, would 
provide an entirely different ap- 
proach to the problem of time and 
motion study. Likewise, Birming- 
ham and Taylor (1954) also suggest 
ways to analyze performances which 
led to entirely different kinds of cate- 
gories from the Therblig. Although 
it has not developed fully, the view- 
point has the seed of a broadly dif- 
ferent approach to work methods. 
It would search for categories in the 
characteristics of behavior, and tak- 
ing these as givens, flesh out the gen- 
eral principles on which machines 
could be designed to utilize them. 

Specific empirical research areas in 
the field have, in general, fallen under 
three heads: studies of the optimal 
environment, studies of information 
display, and studies of equipment de- 
sign (Fitts, 1951). It is in the nature 
of the field that the published reports 
are highly specific and tend to havea 
fragmentary character whose integra- 
tion is only achieved through an over- 
view of the concepts of the whole 
field. The volume of published re- 
ports on individual determinations is 
tremendous, even outside of the con- 
ventional professional journals. The 
Navy (U. S. Navy Special Devices 
Center), for example, in 1955 pub- 
lished a bibliography of 376 refer- 
ences to unclassified project reports 
in these areas. 

One problem within the area has 
shown tremendous growth and seems 
to have the broadest possible impli- 
cations for future research. Chapanis, 
Garner, and Morgan (1949) gave a 
clear and extended treatment to 
the problem of kinds and sources of 
error in man-machine systems. The 
identification of constant and vari- 
able errors and of errors associated 


with the man and the equipment 
early gave the human engineer a 
kind of diagnostic tool to identify the 
places where work would be most 
profitably applied. Subsequently, 
this kind of approach led to an anal- 
ysis of systems errors in which all 
the specific problems of the field con- 
verged: the design of the controls 
may maximize or minimize the varia- 
bility of the operator’s response, the 
display of information similarly in- 
fluences the error, and the serial 
order of responses within an operator 
or from operator to operator becomes 
an important variable. As this tend- 
ency to analyze complex man-ma- 
chine systems in terms of minimizing 
systems errors grows, the engineer 
approaches a field which seems, at 
first glance, to belong more properly 
in the bailiwick of the social theorist: 
the problem of organization theory. 
Within relatively small units—radar 
warning systems, fire-control sys- 
tems, and the like—this approach 
has designated the optimal organiza- 
tion for minimizing error in the sys- 
tem. As it would apply to a larger 
organization—for instance, a busi- 
ness or an industry group—the work 
would be cumbersome and volumi- 
nous, and it would require some new 
techniques for the descriptions of ac- 
tivities in various parts, but eventu- 
ally an organization theory for in- 
dustry should flow from the ap- 
proach of the human engineer. With 
its history of concentration on the 
display of information and of char- 
acteristic responses to certain kinds 
of messages, it probably fits most har- 
moniously with the decision theorists’ 
work on theories of teams (Helmer, 
1957; Marschak, 1952; Radner, 
1953). The introduction of problems 
of uncertainty in such systems (Car- 
ter, Meredith, & Shakle, 1954) 
changes the problem considerably 
from the usual formulation of the hu- 
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man engineer, but, still somewhat 
paradoxically, one basis for a future 
theory of social organizations seems 
to lie in the field. It is perhaps worth 
pointing out that the approach from 
the analysis of systems error in com- 
plex man-machine systems gives a 
particular kind of organization—i.c., 
one that minimizes error—and that 
it will differ sharply from the social 
theorist who may, for example, be 
interested also in maximizing per- 
sonal and organizational goals simul- 
taneously. 


PERSONNEL PSYCHOLOGY 


In its simple base, the field of per- 
sonnel psychology rests on a correla- 
tional relationship between a nor- 
mally distributed predictor variable 
(which may or may not be simply re- 
lated to the skills and abilities re- 
quired on the job) on the one hand, 
and another normally distributed 
measure of criterion performance. 
The equally simple matching task is 
to eliminate, on the basis of the rela- 
tively inexpensive predictor variable, 
those with low likelihood of success 
in criterion performance. 

As the field developed, all three— 
the predictor, the criterion, and the 
match—lost their simplicity. The 
predictor went from simple motor 
skills and primary mental abilities to 
job knowledge, and finally to in- 
terests, motives, background, and 
even trainability. The criterion 
broadened into the complexities of 
job families, job description, job eval- 
uations, merit ratings, and multi- 
dimensionality, and has become the 
knottiest problem in the field. 
Matching went from the simple 
mathematics of screening to the 
much more involved problems of 

classification and team construction. 

In some ways the outstanding fact 

in the history of the simple selection 
procedures is the relatively long pla- 
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teau of no real progress after a period 
of initial improvement. Cronbach 
(1957) puts it that none of the refine- 
ments since 1920 have improved 
practical predictors by a noticeable 
amount. This is a rather extreme 
view, but it certainly seems likely 
that if one could plot all the validity 
coefficients ever reported, they would 
probably form a characteristic learn- 
ing curve, with a rapid initial rise and 
a long plateau as validities settle be- 
low an unattainable asymptote of .40 
or .45. 

Hull (1928) suggested that a prac- 
tical limit for validities might be in 
the neighborhood of .50. Nothing in 
the history of selection testing has 
radically revised this figure after 30 
years. The main source of limitation, 
however, seems to be with the proper 
and adequate definition of the crite- 
rion variable rather than with either 
the predictor or the nature of the 
match. It is almost certain that here, 
as in other areas referred to in this 
paper, further differentiation of the 
criterion is the eventual path to re- 
search progress, Wallace and Weitz 
pointed out in 1955 that the criterion 
problem leads all other topics in in- 
dustrial psychology in lip service, but 
trails in work reported. The problem 
is especially difficult in this area, 

} The two problems of relevancy and 
reliability of criterion measures are 
relatively straight forward. The 
problem of the identification of mul- 
tiple criteria and their subsequent 
combination and weighting is prob- 
ably the most difficult but the most 
fruitful. The so-called “dollar” cri- 
terion, or the simple criterion of pro- 
ductivity seems inevitably to hide 
the details of the psychological prob- 
lem involved. On the one hand, such 
criteria leave out other important rel- 
evant measures—e.g., turnover, 
grievances, etc. On the other hand, 
and perhaps more important, such 
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criteria are themselves so deter- 
mined by a multitude of factors (var- 
ious skills and abilities, motivations, 
working conditions, and the like) 
that their prediction will probably 
never go far beyond the present 
level. Only the detailed differentia- 
tion of the criterion and its eventual 
reconstitution seem to hold promise 
for raising the general level of valid- 
ities. 

The problem of classification—op- 
timizing the simultaneous match of 
several men and several positions— 
has been a difficult one. Thorndike 
(1949) began the mathematics for a 
simple case, and, more recently, iter- 
ative solutions of the linear program- 
ming type have shown promise. 
Here, also, the maximization of the 
matrix values, while a great advance, 
can never be better than the criterion 
used to scale the values concerned. 

Even more discouraging in the 
mathematics of simple selection are 
the implications of a paper by Brown 
and Ghiselli (1953) which is too 
little noticed in the field. Working 
within the simple algebraic relation- 
ships of the correlation between pre- 
dictor and criterion, they provide a 
set of values in which one may read 
the percentage of improvement in 
productivity of a selected over an un- 
selected population. The improve- 
ment depends on (a) the validity, 
(b) the cut-off, and (c) the variance 
in criterial performance. Putting in 
hypothetical but reasonable figures, 
the result is discouraging. Given a 
frequently encountered validity of, 
say, .30, and the ability to reject 
50% of the applicants (having al- 
ready rejected on all other bases but 
the test), and with a ratio of best to 
worst in the criterion of 1.5 to 1, the 
improvement in productivity of the 
selected group over the unselected 
will be only about 3%. Even 3% is 
not to be disregarded as a practical 


matter, but it is not the universal 
panacea that many business men 
have dreamed of—a dream that 
many psychologists, who should have 
known better, have encouraged. 

Several specific implications seem 
to flow from the Brown and Ghiselli 
paper. For one thing, in practical 
terms, the tremendous leverage 
gained by shifting the cut-off is obvi- 
ously related to the state of the labor 
market. In the past few years we 
have had a very tight labor market 
where it was necessary to hire almost 
everyone to fill jobs. Now, with the 
war-time bulge in the birth-rate com- 
ing out of schools, considerably more 
selectivity is possible, and the practi- 
cal value of even relatively low 
validities increases. We should note, 
also, that these values refer to a 
simple yes-no selection, and do not 
take into account the more compli- 
cated classification problem. 

Again, the tendency to rationalize 
jobs mentioned above cuts across the 
selector’s effectiveness. No precise 
data are available on the ratio of best 
to worst in industrial practice, but in 
highly mechanized production-line 
situations the ratio is said to be about 
1.1 to 1. Such a value gives little 
power to selection. As this trend con- 
tinues to reduce the cost of turnover 
and training, the possible improve- 
ments by selection disappear. It is 
just here that the human engineer’s 
attempt to rebuild the job so all can 
do it equally well can come in direct 
conflict with the tester. 

One area of testing probably bene- 
fits from all the variables in the 
Brown and Ghiselli function—the so- 
called assessment techniques for the 
early identification of high-level tal- 
ent. Although the validities are typi- 
cally much lower, the variance of 
performance in, for example, execu- 
tives, is very much greater than in 
lathe operators, and the percentage 


176 MASON HAIRE 


selected is very much smaller as the 
triangle of the hierarchy narrows 
toward the top. These last two fac- 
tors combine to outweigh the present 
validities and give much more power 
than in the case of selection for cleri- 
cal and manual jobs. In the present 
practical situation there is one special 
danger here. Many of the tests used 
for personality assessment in the in- 
terests of selecting executives are 
general personality tests which are 
not validated for executive perform- 
ance but for the identification of per- 
sonality traits. The relevance of 
these characteristics to successful 
performance is often made by a kind 
of intuitive judgment of what kind of 
person one would like to have on the 
job. Since it is impossible to forecast 
the demands that will be placed on 
the organization in the future, this 
seems to be a dangerous putting of 
all of one’s personnel eggs in one 
basket. Fortunately, at the present, 
the validities are low enough so that 
the consistent use of almost any of 
the personality tests will still provide 
enough error variance to protect the 
organization against poor judgment 
about the qualities it thinks it is se- 
lecting. 

The increase in productivity af- 
forded by selection also leads one to 
think of the possible improvement of- 
fered by training. Unfortunately, the 
literature on the measured effect of 
training programs in industry is 
sparse. This is particularly surpris- 
ing, perhaps, in a science where such 
a large proportion of research and 
theoretical development has been in 
the field of learning. Like the crite- 
rion problem, a great deal is said about 
the advisability of studying training, 
but relatively little has been done. 

There are some studies of the ef- 
fectiveness of training showing posi- 
tive results (Maier, 1953; McGehee 
& Livingston, 1954; Wallace & Twit- 


chell, 1953) and some showing no ef- 
fect (Baxter, Taaffe, & Hughes, 
1953); again the criterion problem is 
an ever-present difficulty. Several 
studies have referred the effect of 
treatment back to a somewhat vague 
determinant known as the “climate” 
of the group or organization (Buchele, 
1955; Fleishman, 1953; Jennings, 
1955). In the absence of precise defi- 
nition of the variable or the way in 
which it operates, this has not proved 
fruitful, although it probably points 
to something which must be dealt 
with later. Certainly much of our 
training in industry takes place 
within a classroom, is measured 
there, and probably never shows an 
effect on the workroom floor (Fleish- 
man, 1953). 

Edgerton (1955) suggests some 
solid advances in the field—“watch 
and do” training films, the Air Force 
work on team training, the Navy's 
work on where training devices are 
needed. In addition, the growth of 
the National Training Laboratory 
and the Western Training Labora- 
tory have focused the group dynami- 
cists’ interest on the problem. Still, 
there is very little in the way of 
evaluation of the effectiveness of 
training. 

There are a number of exhortatory 
articles urging an evaluation of train- 
ing but relatively few comparable to 
Edgerton’s in detail of review and re- 
search suggestion. He proposes a de- 
tailed research program including 
measurement of predictor abilities, 
variation of training methods, and 
criterion measures. In this way it is 
somewhat similar to Cronbach’s pro- 
posal (1957) that a matrix containing 
both treatments (training methods) 
and aptitudes is needed and that the 
interaction factors may be maximally 
effective. It should be pointed out 
that the approach implied in these 
two suggestions is, to some extent, at 


PROBLEMS RELEVANT TO BUSINESS AND INDUSTRY 177 


odds with the pure correlationists’s 
approach outlined earlier and is a 
step toward a rapprochement with 
the experimentalist. This combina- 
tion of treatment and individual dif- 
ferences has not been explored in 
terms of job design and equipment, 
but there is no reason why it should 
not be. 

For example, micromotion tech- 
niques are well worked out for study- 
ing job performance. A detailed 
study of the way work is done by in- 
dividuals high on the predictor and 
low on the criterion (or vice versa) 
might yield real possibilities for the 
redesign of performance leading to 
the productive criterion. Particu- 
larly when the ability required is 
scarce, such a procedure would be 
useful. In terms of the national pro- 
ductivity, a national sample of, for 
instance, psychomotor skills and the 
redesign of jobs to take advantage of 
them would seem both feasible and 
fruitful. 

The problem of training cuts across 
the selection problem more directly. 
Most validation studies are con- 
ducted against a fixed training crite- 
rion, i.e., the simple pass-fail criterion. 
However, we have little research on 
the prediction of trainability in terms 
of the correlation between a selector 
and the slope of the learning curve. 
In many cases a lower slope and a 
higher eventual level seems a real 
possibility, and for long employment 
the prediction of both slope and even- 
tual improvement seems important. 
Again, most validities are against 
proficiency criteria taken very early 
in job-life compared to the typical 
tenure of a producer. There is some 
evidence that validities and intercor- 
relations taken at different periods of 
employment would give very differ- 
ent weightings in a predictive bat- 
tery, and Fleishman and Hempel 
(1954) have suggested a change in the 


factorial content of motor skills with 
practice. All of these points seem to 
suggest that in addition to differenti- 
ation of the predicted criterion in 
terms of job-relevant factors the 
elaboration of its psychological com- 
ponents might also be fruitful. 

In practical terms, the interview is 
still the most widely used selective 
device. Some progress has been 
made in the development of pat- 
terned interviews, biographical data 
blanks, and job knowledge tests to 
support the interviewer, but both be- 
cause of its wide use and relatively 
little development, this area seems 
particularly ripe for research. Since 
Meehl’s distinction between clinical 
and actuarial approaches, the direc- 
tion of development in terms of a 
validation of the interviewer himself 
on an actuarial basis seems the like- 
liest to bear fruit. The study of the 
interviewer and validating him as an 
instrument are surprisingly lacking in 
published research. 


INDUSTRIAL SOCIAL PSYCHOLOGY 


As has been suggested above, it is 
harder to give a compact historical 
background for this subarea than for 
the others. An historical review has 
recently been given elsewhere (Haire, 
1954) and will not be repeated here. 
Likewise, it is harder in this area to 
give the central theme that charac- 
terizes the field. Several of the 
themes must be identified and will be 
dealt with in detail below. The first 
of them is the interest in group proc- 
esses flowing from the impact of 
Lewin, Moreno, and the Mayo school. 
These include the interest in socio- 
metric structure of the group, roles, 
resistance to change, small group 
dynamics, and the social organiza- 
tion of the factory. The second is an 
interest in “the psychology of the 
other one” in the verstehen tradition, 
with an emphasis on an approach 
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through perception and a detailed 
understanding of the world of experi- 
ence of the subject. A third is the 
presence of a broad humanistic value 
which seems to run through the field. 
It is often not explicit, but lurks im- 
mediately below the surface. For ex- 
ample, the “human relations” point 
of view often seems to suggest that an 
increase in need satisfactions at work 
will increase productivity (Brayfield 
& Crockett, 1955); often, however, it 
seems as if the suggestion is that even 
though it may not increase produc- 
tivity, an increase in need satisfac- 
tions for the worker is a social good. 
Again, Argyris (1957) and others 
(Katz & Kahn, 1952) seem to imply a 
similar calculus: it is possible and ad- 
visable to reduce the achievement of 
organizational goals in order to in- 
crease the achievement of individual 
need satisfactions. A fourth stream in 
the field is a shifting in the manner of 
dealing with motivation. Taylorism 
and the vogue of incentive pay sys- 
tems is closely related to the 18th 
century economic man. With more 
sophisticated psychological theory 
of motivation this interpretation 
gives way to one in which the drives 
are both more differentiated and in- 
ternalized. Katz and Kahn (1952) 
emphasize the internalization of re- 
ward. The differentiation of motives 
is more complex. In a sense the idea 
of economic man gave way to a re- 
fined instinct theory. Veblen’s Jn- 
stinct of Workmanship (1918), Tead’s 
Instincts in Industry (1918) and Wil- 
liams’ Mainsprings of Man (1925) 
are early examples of the transitional 
state. Brayfield and Crockett (1955) 
point to, and Kornhauser and Sharp 
(1932) illustrate, the period when 
the instinct argument was being 
fragmented into complex motiva- 
tional analyses and attitude surveys. 
With motivational sophistication, the 
complexities of modern theory ap- 


pear, with, for example, Maier’s em- 
phasis on frustration (1955), Stag- 
ner’s interest in the conflict of dual 
allegiance (1957) and the somewhat 
Freudian analysis of McGregor and 
others (1944). 

External influences have been par- 
ticularly pressing in this area. The 
growth in size of industrial organiza- 
tions has tended to destroy, mana- 
gerial reliance on old face-to-face 
relationships and to force a consid- 
eration of small group pressures and 
of formal characteristics of large 
organizations. The influence of ra- 
tionalization of jobs has been men- 
tioned; a reduction of emphasis on 
individual skill raises problems re- 
lated to the kind of satisfactions one 
finds at work. It has also cut across 
the problem of group structure; for 
example, the disappearing role of the 
foreman is partly due to the changing 
organization of skills and responsi- 
bility. Collective bargaining has 
introduced new group problems of 
primary allegiances, and a rich field 
for attitude study as well as the in- 
process study of the bargaining itself. 
The union and its structure has be- 
come a subject for study, quite apart 
from its relation to management. In 
addition to these there are some 
broad social currents related to the 
developments in industry. One of 
these is the increasing profession- 
alization of management which means 
more than a simple specialization of 
function. It is the development of a 
group of specialists not primarily re- 
lated to the product, or, indeed, to 
production, but to the administra- 
tion of large complex social organiza- 
tions and, especially, to the problems 
of dealing with people within them. 
Partly out of this has grown an in- 
terest in a proper philosophy of man- 
agement—an asking of “why” in- 
stead of the traditional “how” which 
should lead the field to the broader 
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problems of social philosophy. As the 
corporation grew, social psychologi- 
cal problems arose with increase in 
its size. In addition, however, addi- 
tional problems arose from the fact 
that the legal entity of the corpora- 
tion and its traditional forms of or- 
ganization were frozen when size, 
communication technology, distribu- 
tion technology, and automation of 
production were entirely different. 
In many ways, industry has been less 
flexible in adapting its organization 
to relevant technological changes in 
administration than other large social 
organizations such as the Armed 
Services. This inflexibility creates 
problems of its own. Still another 
general social interest arising out of 
industrial organization is the current 
concern with conformity and the 
Organization Man (Whyte, 1956) 
which makes contact with, but did 
not seem to initiate, the psychological 
research now going on in the field 
(Asch, 1956; Crutchfield, 1955; Tud- 
denham, 1958). Finally, a whole 
group of basic societal values related 
to industry seem to have shifted, and 
to have had an influence on the gen- 
eral trends of interest in industrial 
social psychology. It used to be said 
that the United States was a success- 
oriented culture, with values seated 
in productivity, industriousness, and 
achievement. It is perhaps not ex- 
cessive to say that it is becoming an 
adjustment-oriented culture, with 
values stemming from fitting in with 
the group. We find contradictory 
norms of conformity and inconspicu- 
ousness, on the one hand, inhibiting 
the individual from the singleminded 
following of his own path, and, on the 
other hand, a rejection of industrious- 
ness as compulsive and achievement- 
oriented. In this connection, it is 
often said, jokingly, that the Ford 
Motor Co. has grown so that if “old 
Henry” were alive today there would 


be no place for him in the organiza- 
tion. It is probably true, and hisin- 
ability to fit in perhaps would stem 
from both of the values mentioned. 
Some of these social values—toward 
conformity and industriousness— 
have been due to the influence of 
psychological thinking; some of the 
psychological thinking about indus- 
trial problems has, in turn, been influ- 
enced by them. In any case, the field 
clearly feels the impact of these his- 
torical developments within industry 
and society. 


Large Group Organization 


In dealing with the problems re- 
lated to group processes and struc- 
tures, it is probably convenient to 
differentiate between large groups 
and small groups. Although there is 
really no fixed borderline between 
them, divergent interest seems clear. 
Historically, the interest in large 
group organization in industrial so- 
cial psychology stems from the view- 
point of Elton Mayo, and particu- 
larly from the major report of 
Roethlisberger and Dickson (1939). 
Mayo’s insistence that “man’s desire 
to be continuously associated . . - 
with his fellows is a strong, if not the 
strongest human characteristic” 
(Viteles, 1953, p. 181) leads off in the 
direction of social motives and small 
group processes, but his interest “in 
social organization of the factory” 
began an emphasis on the industrial 
organization as an autonomous mac- 
rocosmos. Within it there is an in- 
terest in role problems and motiva- 
tions for broad functionally defined 
classes, e.g., the white collar worker 
(Mills, 1956), and in stresses within 
the organization such as the relation 
between staff and line (McGregor, 
1948). The hierarchical character of 
the organization appears in connec- 
tion with prestige and status (Bar- 
nard, 1946), though, surprisingly, it 
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has not extended into the studies of 
communication, where one might ex- 
pect to find hierarchical barriers to 
the flow of information within the 
group. In contrast to the formal or- 
ganization, the informal organization 
which develops spontaneously out of 
the group itself has been seen as a 
source for individual need satisfac- 
tions denied by the formal structure 
(Selznick, 1958). The informal or- 
ganization is also the basis for several 
studies of rumor and communica- 
tion (Back, Festinger, Hymovitch, 
Kelly, Schacter, & Thibaut, 1950; 
Festinger, 1950), and for its role in 
morale (Arensberg & McGregor, 
1942). 

In dealing with the large group, the 
field of organization theory has 
shown great activity in recent years 
(Argyris, 1957; Bakke: 1950, 1953; 
Bakke & Argyris, 1954; Barnard, 
1950; Haire, 1955b; Herbst, 1957; 
Simon: 1952a, 1952b, 1955; Weiss, 
1956). The most frequent single 
thread running through this material 
is the conflict between the organiza- 
tion’s goal and the satisfaction of the 
individual’s motives. Is it possible to 
have unity of direction in an organ- 
ization without sacrificing autonomy 
in the individual? Is jt possible to 
have an hierarchical chain of com- 
mand without sacrificing egoistic 
need satisfactions in the lower rungs? 
Is it possible to have planned ration- 
alized production without sacrificing 
active independence? McGregor 
(1944) suggests that it is. Argyris 
(1957), following a similar line, sug- 
gests that it is, but only at the cost 
of the organization’s objectives. It is 
not perfectly clear in Argyris’s treat- 
ment whether, on the one hand, he 
is dealing with the large organiza- 
tion, and the impeding effects of pol- 
icies, rules, roles, and formal nets, or, 
on the other hand, with the more 
intimate geography of the small 


groups, and what Walker and Guest 
(1952) call “mass production as a 
code of law.” Barnard (1950) and 
Selznick (1958) both find some solu- 
tion in the existence of the informal 
organization to solve this problem. 
Simon (1955) and the decision the- 
orists in general tend to disregard the 
conflict, assuming a rational man, in 
many ways a more sophisticated 
counterpart of the old-fashioned eco- 
nomic man. This man now maxi- 
mizes strategies by relying on sub- 
jective evaluations and probabilities, 
but the original utility notion is not 
far submerged. 

This conflict between the individ- 
ual’s goals and the organization’s is 
part of the emphasis on broad hu- 
manistic values mentioned above. It 
often urged that the individual 
should be provided with more moti- 
vational satisfaction. Sometimes it is 
explicitly held that an increase in his 
satisfactions will make a more. effec- 
tive producer. Sometimes it seems 
to be implicit that more satisfaction 
for the individual would be a Good 
Thing in any case; this is particularly 
true in dealing with socially valued 
satisfactions such as self-actualiza- 
tion, autonomy, and the like. In 
dealing with the industrial situation, 
this problem, which reappears in 
dealing with attitudes and motiva- 
tion, remains unsolved, Under the 
heading of organization it is sug- 
gested that the organization be op- 
erated in such a way that the indi- 
vidual’s and the organization's 
achievement of goals be maximized 
simultaneously. If this means that 
there is some ideal form of organiza- 
tion which can simultaneously bring 
about the absolute maxima of both 
at the same time it is never explicitly 
stated. If it means, as it often seems 
to, that some of the organization’s 
goals should be sacrificed to increase 
the individual’s, the calculus is never 
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made specific, nor the social philos- 
ophy on which it rests made explicit. 
In current approaches, the other pos- 
sibility—that the individual’s goals 
should be further sacrificed to in- 
crease the organization’s—seldom 
seems even to lie beneath the surface. 
This is probably because in the last 
20 years we have been in a very pro- 
ductive economy with relatively full 
employment, in which attention is 
directed less toward additional ma- 
terial output than to human values. 
Whatever the reason, the social psy- 
chologist unwittingly becomes a so- 
cial philosopher as he chooses values 
underlying his analyses; the issues in 
the field would be clearer if these so- 
cial philosophies were developed and 
made explicit. 

Organization interest has turned 
to the relation between structure and 
function (Haire, 1955b; Weiss, 1956), 
and Herbst (1957) has reported some 
data applying an input-output anal- 
ysis to the description of social or- 
ganizations. In many ways the most 
surprising thing in the field of or- 
ganization theory is the paucity of 
empirical data. For example, there 
seem to be no empirical histories of 
the growth of organizations in terms 
relevant to the social psychologist. 
In industry and economics there are 
a host of reports on growth and or- 
ganization in terms of invested capi- 
tal, dollar volume, and the like, but 
it is hard to find a history in terms of 
the number of people, what their 
functions were, their relations with 
one another, and the relative growth- 
rates of the parts and the whole. 
Such studies seem an essential base 
for organization theories if they are 
ever to move beyond the kind of pos- 
sibility-spinning that characterizes 
them today. 

The interest in the organization of 
large groups in industry has opened 
one area which seems to have par- 


ticular promise for future develop- 
ment in the general psychological 
theory of motivation, and for a 
fruitful interaction between psycho- 
logical and sociological approaches. 
Several recent studies seem to be ap- 
proaching a kind of geographical ecol- 
ogy of motivation at work. Walker 
and Guest (1952) made a detailed 
interview study of workers on an as- 
sembly line. They found, for ex- 
ample, that the number of contacts a 
worker had with his fellows at work 
was not only related to his expressed 
satisfactions, but was also related to 
such company-relevant indices as 
turnover, grievances, and the amount 
of pay necessary to keep a man on 
the job. The assembly line almost 
necessarily is stretched out in a long 
thin line, and the very geography 
of production, originally designed 
with only technological merits in 
mind, is discovered to have liabilities 
in human performance. Similarly, in 
England, when technological ad- 
vances made it profitable to mine 
thin coal seams by a “long wall” 
method, the strength of both small 
face-to-face groups and of the larger 
inclusive group was weakened, with 
an increase in accidents and a de- 
crease in production. In the new sys- 
tem, instead of concentrating a small 
group at a seam-face to do a com- 
plete job of loosening, gathering, and 
transporting coal, a shift attacked a 
long seam, loosening coal for an en- 
tire period, after which another shift 
gathered and transported. This 
change so weakened the structure of 
the group (with loss to the company) 
that it was necessary to find other 
ways to rebuild the small group rela- 
tions with the larger organization. 
When this was done, accidents went 
down and productivity went up 
(Trist & Bamforth, 1951). Studies 
like these lead us to see the machine 
and machine-layout less as a tool for 
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production than as a part of the 
topography of the individual’s work 
place, and, as such, provide a way to 
study the geographical ecology of 
motivation. They seem surprisingly 
similar to the findings of the urban 
sociologists as they investigate the 
change in social organization flowing 
from the change from the wheel- 
spoke towns of the railroad days to 
the modern strip-towns growing up 
along the highways. So far the two 
fields have not come together on the 
problem, but it seems profitable for 
them to do so. At the same time, it 
is worth noticing that such studies 
have implications for the human en- 
gineer’s analysis of work-systems, 
and the geographical ecology of 
motivation adds another dimension 
to the human variables with which he 
should deal in the design of work for 
the optimum utilization of human 
characteristics. 

Jaques’ Changing Culture of a Fac- 

tory (1952) brings up the same kind of 
problem, though his basis of analysis 
is more topological than geographi- 
cal. Similarly, some of the studies of 
rumor and communication demand 
a kind of grid-system against which 
to see the group processes. Davis 
(1953), in studying “grapevine” com- 
munication systems in industry, 
shows isolates in the network, both 
on the basis of geographical position 
and functional position in the group. 
Weiss and Jacobsen (1955), using a 
very large sociometric matrix of com- 
munication contacts, similarly identi- 
fied both isolates and liaison people 
within the group. In nonindustrial 
studies of rumor transmission (Back, 
et al., 1950) both geographical and 
functional position assume impor- 
tance in communication. Further 
study of the layout of the social 
group seems promising for such proc- 
ess analyses as well as for the ecology 
of motivation. 


Small Groups 


In the interest in smaller groups, 
two lines are evident: the structural, 
which might be seen flowing from 
Moreno’s sociometry, and that of 
group dynamics, stemming from Le- 
win. The Lewinians raise the prob- 
lem of group cohesiveness, though the 
problem has not proven particularly 
amenable to attack either theoreti- 
cally or experimentally. They have 
focused considerable attention on 
group problem solving (Kelley & 
Thibaut, 1955) and group decision, 
and it is from the latter area that the 
biggest influence on industrial studies 
has come. Referring back frequently 
to Allport’s article on participation 
(1945), and, in the most detailed 
study of the group, drawing heavily 
on Lewin’s notion of quasi-stationary 
equilibrium (Coch & French, 1948), 
they have produced a group of in- 
stances in industry where participa- 
tive group decision overcomes re- 
sistance to change. While there is 
considerable evidence that participa- 
tion is effective, we have very little 
Suggestion as to why it is. Somehow, 
this technique seems able to muster 
the forces which hold the group to- 
gether, and which often are barriers 
to the group's action, and make them 
positive. Historically, the interest 
in small groups was refocused by the 
Hawthorne studies’ discovery of the 
effectiveness of the group in deter- 
mining productivity. They began, in 
the best engineering tradition, by 
asking, “will the introduction of rest 
Pauses reduce fatigue and monotony 
and hence increase output?” Later, 
in an outburst of serendipity, they 
say, ‘‘it was clear that two essentially 
different sorts of changes occurred 
- - . those changes introduced by the 
investigators in the form of experi- 
mental conditions . . . and a gradual 
change in the social interrelations 
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among the operators themselves and 
between the operators and the super- 
visors. From the attempt to set the 
proper conditions for the experiment, 
there arose indirectly a change in hu- 
man relations which came to be of 
great significance” (Roethlisberger & 
Dickson, 1939, pp. 58-59). The new 
interest in the group’s power to fa- 
cilitate or inhibit change stems from 
this point. 

The field of resistance to change is 
an old one in industrial social psy- 
chology. In 1927 Angles published a 
detailed study of this topic and 
Mathewson devoted a book to it in 
1931, but its base in group processes 
was not clear until it was suggested 
in the Hawthorne studies and vigor- 
ously followed up by the Lewinians. 
Angles, for example, spoke of the 
signs of restriction of output and of 
its causes. He identified the reduc- 
tion in the variance of production 
figures as an indicator of restriction 
20 years before Lewin’s hypothesis 
of gradients of forces around levels of 
performance made it theoretically 
meaningful. Angles interprets re- 
striction chiefly in terms of individual 
factors, speaking of such things as 
fear of rate cutting, physiological 
factors, satisfaction with present 
earnings, and the like. He does men- 
tion (Angles, 1927, p. 250) that “the 
practice seems to rest primarily on a 
strong sense of courtesy to one’s 
mate...” and later, “loyalty to 
one’s work-mate is usually so strong 
that a higher position with better 
wages in the same factory is not de- 
sired if it would mean separation 
from one’s set of neighbors and 
friends. This is more marked in 
women than in men.... The herd 
instinct seems to operate in inverse 
ratio to the skill required for the 
work, and thus it quite commonly 
overcomes natural acquisitiveness.” 
Two things seem interesting in this 


early quote: one, the fact that the 
force associated with the relations 
among workers was identified as part 
of the restriction of output and re- 
sistance to change, though it was not 
emphasized, and, two, the fact that 
it was still permissible to speak in 
instinct terms; the more modern 
terminology of social motives and 
restraining forces had not yet taken 
over. We seem clearly to have pro- 
gressed in the more modern state- 
ments. It is now possible to identify 
and manipulate some of the factors in 
the group. However, further work on 
group processes outside the industrial 
field should lead us to more precise 
determinations of the source and na- 
ture of the problem at work. 

In the industrial situation, the area 
of restriction of output is a somewhat 
strange case in which the very phe- 
nomenon has changed, partly be- 
cause of its recognition and study, 
and partly because of a change in the 
environment in which it operates. 
Where originally it seemed firmly 
grounded in group cohesiveness and 
social motives, it has become appar- 
ently a much more conscious tool of 
bargaining. While at all times there 
seemed to be an element of fear of 
rate change, and a protest against 
what were perceived as too high rates 
of work, more recently the restriction 
of output seems to be used as a de- 
liberate tool preliminary to bargain- 
ing for something else. For example, 
in the building trades, the railway 
workers, and longshoremen restric- 
tion of output has led to fixed work 
schedules—either shorter work-weeks 
or schedules based on units of produc- 
tion, with overtime pay for what 
would be normal productivity before 
restriction. Although the overt be- 
havioral signs may remain the same, 
the psychological context underlying 
them may have changed radically to 
a more conscious use of the group’s 
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power to gain a leverage in a bargain- 
ing situation. 

The small group theorists have also 
markedly changed the interpretation 
of leadership. Particularly in the in- 
dustrial setting, the traditional view 
of the leader in the past was of a 
charismatic individual possessing the 
trait of leadership. A great deal of re- 
search went, without much success, 
into attempts to identify these qual- 
ities of leadership in the interest of 
selection. The emphasis on the group 
and group processes led to a descrip- 
tion of the leader in the mold of The 
Admirable Crichton—a man who, in 
the particular situation, possessed the 
skills and abilities to provide means 

for the satisfactions of the needs of 
group members. We now speak of 
the “emergent” leader, we distin- 
guish between “headship” and “lead- 
ership,” and we use “buddy ratings” 
to identify leaders. The pendulum 
has swung a long way to one side, 
and the reverse trend is already dis- 
cernible in the assessment field, in a 
return to the search for the qualities 
of leadership within the individual 
instead of the group. However, even 
as the pendulum swings back, it is 
more group-oriented, and the vari- 
ables tend to deal with relations with 
others rather than decisiveness, force- 
fulness, and determination, For in- 
dustrial studies of leadership, the 
problem will surely have to be re- 
ferred to the growing body of studies 
of organization theory. For example, 
as the size of the group increases, the 
simple ability to hold the group to- 
gether becomes one of the most im- 
portant problems, and, in many 
cases, a corporate entity seems to di- 
rect itself a large proportion of the 
time, with the leader’s job being to 
keep it working as a unit, to keep in- 
formation flowing through it, and to 
adjust the parts to one another. 
Again, as the study of group process 


progresses, particularly as it moves 
toward organization theory, the re- 
statement of the problem of leader- 
ship seems a likely fruitful outcome. 

In the tradition of the sociometric 
analysis of the internal structure of 
groups, there is a lively development. 
Danzig and Galanter (1955) and 
Weiss and Jacobsen (1955) have 
proposed sophisticated sociometric 
analyses of industrial groups and dem- 
onstrated their applicability empir- 
ically. The techniques are computa- 
tionally cumbersome, but they are 
encouraging in the indication that it 
is possible to work with groups of at 
least two or three hundred, and that 
fruitful results may come from such 
analyses. The first of these studies 
provides operational meaningfulness 
to the concepts of cohesiveness, social 
distance, and centrality, using the 
terms somewhat as‘ Bavelas (1948) 
did in dealing with communications 
nets. Developed further, this would 
mean that it was possible to define all 
the radii of an industrial group at a 
given point in time, and, hence, to 
describe the shape of the organiza- 
tion in a much more functional man- 
ner than in the traditional “family 
tree” organization chart. If it were 
possible thus to represent the shape 
of the organization it would allow us 
to do a kind of longitudinal study 
which has been so far unavailable to 
social psychologists. The anthropo- 
metric studies in the area of child de- 
velopment have been fruitful sources 
of understanding. A similar historical 
Picture of the growth of social or- 
ganizations would seem equally 
promising if the means of represent- 
ing shape and structure at various 
stages becomes available. The Weiss 
and Jacobsen study points to an- 
other similar Possibility. They iden- 
tified subgroups on the basis of the 
kinds of contacts which occurred— 
separated work groups, liaison peo- 


vA 


PROBLEMS RELEVANT TO BUSINESS AND INDUSTRY 185 


ple, and isolates. Of their group of 
200, about 18% were tentatively 
identified as having a liaison func- 
tion. Again, this kind of analysis 
could provide a measure for the 
growth of organizations with which 
we could see the rate of demand for 
liaison as the size increases. Such 
studies could give us insight into the 
stresses within the system arising as 
a result of growth. 

Also in the sociometric tradition, 
although a long way from its origin, 
would be studies like Cartwright and 
Harary’s (1956) use of graph theory 
to deal with symmetry and asym- 
metry in attitudes within the struc- 
ture. French (1956) uses a similar 
approach to deal with social power. 
Such sociometric analyses have been 
urged as a management tool, or, as 
the sociometrist develops toward the 
study of interaction process, as a pre- 
dictive device (Chapple, 1953). The 
study of group structure through 
interaction process analysis has de- 
veloped both in technique and theo- 
retical statement (Bales, Flood, & 
Householder, 1952). It is still cum- 
bersome, but in some ways it is the 
natural bridge between the static 
structuralism of the pure sociometric 
tradition and the dynamics of the 
Lewinians. A special variety of the 
study of interaction structures within 
groups is the field of communication 
nets and their relative effectiveness. 
Most of the work stems originally 
from Bavelas’s model (1948) and 
there has been a certain amount of 
sameness in both the form and find- 
ings of the empirical résearch 
(Bavelas, 1950; Heise & Miller, 1951; 
Leavitt, 1951; Leavitt & Mueller, 
1951; Shaw, 1955; Shaw & Roths- 
child, 1956). However, in view of the 
central role of communication and 
cummunications nets in the viability 
of industrial organizations, this must 
be seen as a point from which re- 


search progress will flow. 

One other area of small group prob- 
lems seems particularly relevant to 
economic behavior, although it is not 
possible to point to any considerable 
body of published work at present. 
Economic theory has always included 
hypotheses about both human moti- 
vation and interactions between peo- 
ple. Recent developments, following 
historically, though not necessarily 
logically, from the theory of games, 
contain two possibilities especially 
relevant for small group theory and 
research: utility curves for shared 
risk, and variable information in stra- 
tegic decision matrices. It is not pos- 
sible to do more than point to the ex- 
istence of the problems here. It is 
possible to construct psychological 
curves (rather than strictly logical) 
for the risk an individual will run to 
gain a given objective. Such a func- 
tion contains a multitude of factors, 
expectations, risk-taking proclivities, 
present position, and the like. The 
empirical curve is typically not mon- 
otonic. Such risks can be experi- 
mentally shared, in the fashion of 
pooled insurance risks, but it is not 
necessarily true that the new value 
will fall at the point indicated by di- 
viding the risk and return by the 
number of sharers. The field suggests 
possibilities for a joint study of group 
structure and process and the eco- 
nomic area of risk-taking. On the 
other hand, strategic decision theory 
tends to rely on complete information 
or complete lack of it among partici- 
pants in competitive situations; a 
situation which facilitates model 
building but does not approximate 
reality very closely. Again, it seems 
possible to bring to bear the tech- 
niques of the communication-net re- 
search and the work on cohesiveness 
in groups to add a realistic dimension 
to empirical studies of strategic deci- 
sions. 
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Motivation 
In many ways the history of the at- 
tempt to deal with motivation in in- 
dustrial social psychology covers a 
large part of the whole field. Selec- 
tion devices in the strict tradition of 
individual differences were soon seen 
to be inadequate on the simple level 
of skills and ability. Motivations 
were introduced here as individual 
variables in interest inventories like 
the Strong and Kuder. As the 
individual came to be studied in the 
context of the industrial organiza- 
tion, stress was put on higher or- 
der needs—for example, loss of so- 
cial need satisfaction on the one 
hand through mishandling of group 
structure, and of egoistic need satis- 
faction on the other through de-skill- 
ing in the rationalization of jobs. 
Surveys of attitudes moved the moti- 
vational interest from the level of the 
individual to the group average. 
Questions were still asked individ- 
ually, but the results, in terms of 
both the responses and the criterion 
(usually production), were from the 
total group. With somewhat dis- 
couraging results in a simple relation- 
ship between attitudes and criterial 
performance, interest shifted to 
studies of attitude change, and the 
“human relations” movement is 
largely a program of development of 
attitudes among leaders to manipu- 
late the “climate” seen as a variable 
in early group studies, Finally, it is 
largely the interest in motivation 
that shapes psychological studies of 
organization, perhaps stemming from 
the early Lewinian suggestion that it 
is easier to reorganize the structure 
of the whole group than to change 
the individual and leave the group as 
it was. Within this sketchy overview 
lies a large proportion of the re- 
searches in industrial social psychol- 
ogy, a tremendous body of relatively 
unfruitful work on attitudes, and 


some promises for future progress. 

In general, the studies of attitudes 
and productivity stem from the law 
of effect notion in learning theory: or- 
ganisms tend to seek out situations 
that are rewarding and avoid those 
that are punishing. Katz and Kahn 
(1952) make this explicit for human 
relations, Haire (1957b) for indus- 
trial leadership, and Brayfield and 
Crockett (1955) for attitude studies. 
In hoping to predict high perform- 
ance from positive attitudes, the rele- 
vance of the principle is not perfectly 
clear, and, indeed, a summary of a 
host of empirical investigations would 
suggest that a positive relation be- 
tween the two is tenuous at best. On 
the positive side, the law of effect 
would involve the worker finding 
satisfactions on the job, and these 
satisfactions would be revealed by 
appropriate attitude surveys. It is 
not necessarily true, however, that 
the pursuit of these satisfactions 
would lead to productivity. On the 
negative side, lack of satisfactions 
might be expected to lead to avoid- 
ance of the job-situation, and, since 
Presence is a necessary condition for 
productivity, to reduced criterial per- 
formance. Indeed, some studies do 
suggest a relation between negative 
attitudes and withdrawal in the form 
of turnover, accidents, and absen- 
teeism (Brayfield & Crockett, 1955). 
On the positive side, however, we 
seem to need considerably more dif- 
ferentiation of motivational theory 
to serve as an intervening variable 
between surveyed attitudes and ob- 
served productivity, and better co- 
ordinating definition between the 
motivational constructs and the re- 
alities of the industrial situation, 

A step in this direction is found in 
motivational analyses. The Michi- 
gan studies of productivity and mo- 
rale (Katz, Maccoby, Gurin, & Floor, 
1951) for example, introduce inter- 
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vening steps of group-oriented versus 
task-oriented supervision between 
attitude and the criterion. Without 
this kind of differentiation, the meth- 
odological elegance of scaling and 
the development of subscales in atti- 
tude surveys seems likely to be of 
little use in the future on this prob- 
lem, as it has been in the past. The 
studies of Walker and Guest, referred 
to above, broke attitudes down into 
social satisfactions associated with 
contacts on the job, egoistic satisfac- 
tions associated with feeling of 
achievement at work, and the like. 
While these studies did not relate the 
findings to criterial performance, 
they provide a step toward it. Sim- 
ilarly, a group of other studies (Katz 
& Kahn, 1954; Schaffer, 1953; Wick- 
ert, 1951) suggest the possibility of 
identifying and producing egoistic 
satisfactions at work, and, in some 
cases, of relating them to criterial 
performance. 

The assumption that high levels of 
satisfaction of the individual’s needs 
will be related to productivity does 
not seem to hold. Put in these terms 
it is not surprising; there is no reason 
to believe that the individual’s goals 
and the organization’s will coincide. 
A more detailed understanding of the 
kinds of satisfactions determining be- 
havior at work (productive and 
other) and, particularly, the ecology 
of drives at work suggested above 
would seem to be needed before sim- 
ple relationships between morale and 
productivity can be hoped for. 

One area of motivational studies is 
surprisingly lacking—the study of 
motivation of management. While 
psychologists have inveighed vigor- 
ously against the oversimplifications 
of the Economic Man as a motiva- 
tional model for workers, I suspect 
that we have largely kept it in inter- 
preting management behavior. The 
manager is seen generally as actuated 


by money and power as motives, al- 
most as if there were a difference in 
kind between superiors and sub- 
ordinates. A broad program of in- 
vestigation in this area would seem 
worthwhile; it would also tie in well 
with the assessment approaches to 
the identification of high level talent. 
As it develops, such an interest would 
go beyond the simple motivational 
interpretation of behavior, and, per- 
haps, lead us to a statement of what 
might be called a philosophy of man- 
agement. Now, when the corpora- 
tion is taking on a very different role 
in the national social structure, and 
when the professional manager is a 
dominant figure in the corporation, 
the general value systems in which he 
operates become of especial impor- 
tance. The other side of this coin, in 
a sense, is the social role of the man- 
ager. The American manager has a 
quite different role in the view of the 
public from the position of the man- 
ager in most of the other Western 
countries. Indeed, much of the 
strength of American industry seems 
to flow from this. It would seem an 
appropriate area for study to investi- 
gate this closely. Such work would 
be closely related to the companion 
studies of role perceptions of man- 
agers of themselves and of their posi- 
tions, and of the influence of hier- 
archical levels on self-perception. 

It seems somewhat strange that 
motivational analysis has not in- 
cluded the various levels of manage- 
ment. Studies of fatigue have con- 
centrated on the worker, whose hours 
are typically much less than the man- 
ager. The emphasis on social and 
egoistic need satisfactions at work 
have been primarily on the hourly 
paid worker, as if he alone had the 
sensibilities to avoid the rigid deter- 
minism of economic motivation. 
Some of the role problems of the fore- 
man in complex structures have been 
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pointed to (Ghiselli & Lodahl, 1958; 
Porter, 1959), though there is little 
research, but there is virtually no 
work in psychology on prestige and 
status as springs of action in the man- 
agement structure. Our picture of 
the problems of human motivation 
and organizational goals will never be 
complete without an analysis of the 
directing part of the structure as well 
as the larger portion of the work 
force. 


Communication 


The general problem of communi- 
cations has long been a part of indus- 
trial psychology. Early it was largely 
stimulus bound. The human engi- 
neering work on the size of type is in 
this tradition, and, to a large extent, 
the Flesch count is a modernization 
of this original question, substituting 
length of sentences for size of type. 
Early work in advertising, too, gen- 
erally followed this tradition, con- 
centrating on stimulus characteristics 
designed to attract attention.2 More 
recently, work on communication 
has turned to structural factors in 
groups, mentioned above, and to a 
more detailed study of the process of 
communication. 

As the problem moves from the 
simple one of stimulus presentation 
and a criterion of recall or failure to 
recall, the details of the process be- 
come important. In industry itself, 
some studies have suggested that cer- 
tain kinds of information can be dif- 
ferentially utilized by workers (Chis- 
holm, 1955). Hovland and co-work- 
ers (Hovland, 1957; Hovland, Janis, 
& Kelly, 1953) have considerably 


2 This paper does not go into the consider- 
able body of work which has recently been 
done in the field of market research and con- 
sumer behavior. It could go appropriately 
here, but it is covered in another paper in this 
series, by Paul Lazarsfeld, in The American 
Journal of Sociology. 


differentiated and enlarged the stim- 
ulus problem. They raise the problem 
of primacy, tieing the problem of 
mediating attitude change to more 
traditional learning studies. Hovland 
also extends earlier work on the pres- 
tige of the communicator by varying 
source credibility, and opens the sub- 
ject’s motivational system by study- 
ing the effects of fear-arousing prop- 
aganda, The Yale studies also en- 
larged the criterion problem, study- 
ing the process as well as the fact of 
retention by observing changes in 
material over a period of time. Fur- 
ther internalization of the process is 
indicated by the suggestion that 
identifiable steps of identification, in- 
ternalization, and compliance can 
fruitfully be recognized in the sub- 
ject. Cartwright (1949) made some 
detailed suggestions about the condi- 
tions necessary for communication to 
produce a change in attitude. An- 
other facet of the process has been 
opened by Festinger’s (1950) inquiry 
into the motives for communication. 
Most of these developments have 
been in relatively pure laboratory 
situations rather than in industry. 
However, the variables seem imme- 
diately relevant to present industrial 
problems, and the progress in stating 
the problem promises future growth 
in the attack on communication in 
industry. 

The rise in the field of social per- 
ception probably deserves to be in- 
cluded under the general heading of 
communication, since it deals with 
variables influencing the reception 
of stimulus information. The term 
“social perception” seems to have 
two distinct meanings (part from the 
third usage which Brunswik gave it): 
on the one hand, the influence of the 
social group on the process of percep- 
tion, stemming from the Sherif ex- 
periments on autokinetic movement, 
of which the conformity studies al- 
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ready mentioned are ideological de- 
scendants. The other meaning is the 
perception of social phenomena—of 
personalities, roles, institutions, and 
the like. Here there is a great deal of 
work relevant to industrial problems 
—studies of roles and role percep- 
tions in hierarchies, of labor and 
management’s perceptions of one an- 
other, and the definition of leadership 
in terms of the perception of supe- 
riors and subordinates. 

In dealing with the problem of in- 
formation presentation under the 
general heading of human engineer- 
ing, it was necessary to broaden the 
problem somewhat to the problem of 
decision theory, the requirements of 
information in teams, and informa- 
tion under conditions of uncertainty. 
Somewhat the same set of problems 
appear properly under the present 
heading. Edwards (1954) summar- 
ized the work on decision making and 
its relation to economic behavior, and 
the field has made some progress 
since that time. It still seems an un- 
usually fruitful area for research. 
Atkinson (1957), for example, has 
opened the question of motivational 
determinants of risk-taking behavior, 
and Scodal and others (Scodal, 
Ratoosh, & Minas, in press), return- 
ing to the tradition of the personnel 
psychologist, have investigated the 
personality correlates of risk-taking. 
The questions in this area are di- 
rectly related to the typical business 
situation and make it possible to 
bring psychological principles to bear 
most immediately. It has been 
pointed out before that the psycholo- 
gist has traditionally inveighed 
against an oversimplified concept of 
economic man as a model for motiva- 
tion. The risk-taking problem lets us 
test more sophisticated motivational 
notions directly in the very situation 
for which the economic man model 
was invented. Expectations, levels of 


aspiration, past experience, and self 
concepts all are relevant to the risk- 
taker’s behavior. The amount and 
kind of information is a variable in 
strategic decision matrices. The posi- 
tion of an individual within a group 
bears on immediate relevance to the 
problem. Ina sense the restriction of 
output already referred to is part of 
the problem—either in its original 
sense of turning away from economic 
advantage to maximize social need 
satisfactions, illustrating some transi- 
tivity in motivational values in util- 
ity curves, or in its recent sense of a 
more complex bargaining for con- 
tractual advantages in a situation 
more nearly approaching the classic 
theory of games. The combination of 
these problems with, for example, 
those of the experimental sharing of 
risks, and the human engineer’s in- 
terest in information display, would 
seem to indicate a broad and fruitful 
field for the future. 


CONCLUSION 


It is perhaps easiest to set a sum- 
mary statement in terms of a few 
brief points: 

1. The history of the field points to 
the term “psychology in industry,” 
rather than “industrial psychology.” 
Within it three quite separate tradi- 
tions thrive: the experimentalist, 
manipulating an independent varia- 
ble which is usually a physical stim- 
ulus, the differential psychologist, 
and the social psychologist. Each of 
the three carries the threads of gen- 
eral psychological theory, and the 
development of the field is primarily 
the development of psychology, mod- 
ified somewhat by external societal 
pressures, and somewhat less by spe- 
cific demands of industrial problems. 

2. In general, the whole field has 
focused on work, workers, and the 
conditions of work, and, to some ex- 
tent, in spite of the interest in leader- 
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ship, neglected the problems of the 
motivation of management, condi- 
tions of work for leaders, and the like 
have been largely neglected. Recent 
interest in the general problem of or- 
ganization seems to draw attention 
to the over-all problem of the opera- 
tion rather than to the level of pro- 
duction. 

3. Traditional personnel psychol- 
ogy seems to have stalled on the prob- 
lem of selection and classification, 
though iterative solutions to the 
classification problem provide a new 
elegance. The problem of the crite- 
rion continues to be the chief stum- 
bling block, as it has been for 40 
years. Assessment of high level tal- 
ent and research on creativity is 
opening a new route in this area, with 
the possibility of taking advantage of 
the arithmetic of correlation to pro- 
vide considerably greater leverage in 
the selection of men for managerial 
positions. Like the criterion problem, 
the assessment of training continues 
to be much discussed but is dealt 
with only to a limited extent. 

4. Among the areas which look 
promising for the future, that com- 
bining the problems of risk-taking 
and decision theory seems to com- 
bine most of the psychological fields 
and to promise relatively immediate 
yields. Outside of psychology there 
is a good deal of theoretical develop- 
ment to set the problem. Within 
psychology, most of the fields con- 
cerned bear on it. The differential 
psychologist concerns himself with 
personality correlates of risk-taking, 
the engineer with information in de- 
cision making, and the social theorist 
with motivational problems in both 
areas and with the structure of 
groups, for example, in shared risks, 
Relatively few publications have ap- 
peared so far in this area. 

5. A second area which combines 
the various fields is the growing in- 


terest in organization theory. The 
communications theorist approaches 
it through the problem of networks, 
the human engineer approaches it 
through a similar problem in provid- 
ing information in complex systems, 
and the social psychologist enters the 
problem through the various routes 
of motivation, group structure, and 
roles and status. There is consider- 
able activity in the field, but a sur- 
prising dearth of empirical research. 
Though we have quite a few the- 
oretical statements about how or- 
ganizations grow, we have, surpris- 
ingly, virtually no simple histories of 
how they have in fact grown. Both 
empirical and theoretical work in this 
area seems just around the corner, 

6. Running through a good many 
problems, one development seems 
associated with progress, as it is in 
psychology in general. Where a new 
advance has been made, it has usu- 
ally been with a differentiation of the 
criterion of the original problem, 
which made it possible to bring to 
bear other areas of psychological 
theory. For example, in communica- 
tion, the advance from an early stim- 
ulus bound tradition is marked by 
the work of Hovland and his co- 
workers in opening problems of recall 
and long-term effect on the one hand, 
and conditions of the communicated 
material on the other. Festinger 
opened the question of motivation 
for communication, and the psycho- 
logical process became the topic 
rather than the simple fact of com- 
munication. Again, in the area of 
selection, where the criterion seems 
to block further research, a simple 
lengthening in time of the criterion 
measure might lead the way to 
further insight. Present criteria are 
usually tested shortly after employ- 
ment, and obscure the nature of the 
development of criterion perform- 
ance. Longer measures might focus 
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attention to the slope of the curve of 
response to training, to the interac- 
tion of treatment and aptitudes, or 
to the change in factorial content of 
skills with practice. In the general 
area of work and productivity, the 
Coch and French study of resistance 
to change shows the benefit from dif- 
ferentiation of the criterion and the 
provision of theoretical meaningful- 
ness to the new parts. Similarly, in 
the studies of motivation and produc- 
tion, the opening of the area of the 
ecology of motivation by studying 
the work-place both advances the 
understanding and broadens the the- 
oretical base on which it rests. If 
there is one critical area to advance 
in these problems, the progressive dif- 
ferentiation of the criterion would 
seem to be it. 

7. Finally, a group of other prob- 
lems seems to have the support in 
psychological theory for development 
in this field. The problems of roles 
and status in well-organized hier- 
archies, the general area of a shifting 
management philosophy in response 
to changes in the corporation and the 
society, and the problem of conform- 
ity are illustrative. In some cases the 
psychological problem needs to be 
broadened to include the societal one, 


as for example, in the selection area. 
As technology demands more and 
more high level skills, we tend to 
identify and assign them as if we were 
drawing from a bottomless pool. We 
treat special aptitudes much the way 
we once treated the buffalo or the for- 
est problem, giving little thought to 
the fact that in each case we are ẹx- 
hausting a finite population. As we 
build more technical schools, staff 
more laboratories, and rationalize 
more industrial operations, we will 
need careful demographic studies of 
the available skills and an attack on 
the classification problem on a broad 
scale to maximize the utilization of 
skills, not within an operation but 
over a wide group of operations. It 
is not at all suggested that research 
on psychological problems in industry 
should be largely guided by pressures 
outside the field. Past advance on 
the contrary basis is too persuasive 
an argument for that. However, the 
very success in applying psychologi- 
cal principles to the problems in busi- 
ness and industry forces us to broaden 
the base of our interest to include, to 
some extent, the developments both 
within the industrial complex and in 
the society in which it is imbedded. 
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Few major topics in contemporary 
psychology appear to offer more 
promise than rigidity, and the 
amount of work reported on this sub- 
ject has been increasing year by year. 
Unfortunately it is also the case that 
few areas present such a quagmire to 
the unwary investigator; rigidity is 
not a simple concept, and the subdi- 
visions within it are far from clear. 
The aim of the present paper is two- 
fold: to summarize and assess the re- 
sults of recent explorations, and to 
suggest one or two paths which might 
be followed further. 

“Rigidity” has proved a difficult 
term to define acceptably for it has 
been used to describe behaviors char- 
acterized by the inability to change 
habits, sets, attitudes, and discrim- 
inations. It has grown out of related 
topics, such as perseveration and the 
analysis of personality traits. The 
term perseveration was first used by 
Neisser in 1894, and described by 
Spearman in 1927 in his ‘general 
mental law of inertia: “Cognitive 
processes always both begin and 
cease more gradually than their ap- 
parent causes.” The experimental 
work on perseveration was largely 
concentrated on the motor and 
sensory processes, though emotional 
and ideational aspects were also con- 
sidered by some people. The next 
step forward came in 1935 when Cat- 
tell (1935a) made the distinction be- 
tween “the inertia of mental proc- 
esses,” found when a person is asked 
to alternate between two previously 
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practiced motor skills, and ‘“‘disposi- 
tion rigidity,” which operates when a 
familiar task has to be performed in 
some new way. The latter obviously 
bears a relationship to retroactive 
inhibition and to habit interference, 
and it is only this aspect of persevera- 
tion which Cattell has studied fur- 
ther. He found that disposition rigid- 
ity was related to certain personality 
factors—those of submissiveness and 
lack of character integration. 

Similarly, the history of work on 
the rigid personality can be traced 
back to William James, who divided 
men into tough and tender minded, 
and to Thurstone who found a factor 
of radicalism-conservatism. Eysenck 
(1944) combined these two in the 
same matrix and found that they 
were orthogonal to each other. 
Frenkel-Brunswik  (1949a) sug- 
gested that the prejudiced person 
(i.e. tough and conservative) also had 
a rigid personality. Obviously, this 
“rigidity” need not be the same as 
that studied by Cattell, but the use 
of the same word has highlighted the 
search for a link between flexibility of 
personality and flexibility of change 
in ideas and habits. 

In the light of the history of the 
term, one of the best definitions 
seems to be that given by Cattell 
(Cattell & Tiner, 1949) when he de- 
scribed disposition rigidity as the dif- 
ficulty with which old established 
habits may be changed in the pres- 
ence of new demands. Examples of 
other definitions are those by Ro- 
keach (1948): ‘‘the inability to change 
one’s set when the objective condi- 
tions demand it,” Buss (1952): “‘re- 
sistance to shifting from old to new 


195 


196 SHEILA M. CHOWN 


discriminations,” Goldstein (1943): 
“adherence to a present performance 
in an inadequate way,” and Werner 
(1946a): “lack of variability of re- 
sponse.” These definitions agree in 
general outline but vary in detail to 
an extent sufficient to cause confu- 
sion. There is no agreement over the 
meaning of “a demand to change,” 
“an inadequate performance,” “re- 
sistance to shifting a set,” and “‘ina- 
bility to change.” For example, the 
change of method demanded in some 
tests has been one which is not essen- 
tial to success in the task; it may 
even cause the time spent on the task 
to increase. 
Explanations of Rigidity 
Three explanatory approaches to 
the study of rigid behavior will now 
be described; these are Cattell’s 
work, Goldstein’s study of the brain- 
injured, and the Lewinian theory. 
Cattell (Cattell & Tiner, 1949) dis- 
criminated between two types of be- 
havior which had both previously 
been called “perseveration.” The 
first is “process rigidity” or the tend- 
ency for a former response to continue 
although a new stimulus has been 
substituted for the old one. Another 
name for process rigidity is “mental 
inertia,” and it is best seen in alterna- 
tion tasks. (Temporal contiguity is 
essential if it is to occur at all.) He 
calls the second type “structural 
rigidity.” This is the resistance of a 
habit or personality trait to forces 
which might be expected to change 
it. The habit remains unchanged de- 
spite the fact that a more “reward- 
ing” response to the new stimulus 
could be made. Temporal contiguity 
of the two activities need not occur. 
It is difficult to distinguish com- 
pletely between process rigidity and 
structural rigidity, though each can 
be clearly described by examples. 
Walker, Staines, and Kenna (1943) 


showed that process rigidity could ac- 
tually be explained in terms of struc- 
tural rigidity. 

Cattell suggested that there might 
be three main causes of structural 
rigidity. In the first place, low g 
and also low “fluency of random as- 
sociation” might prevent a person 
from seeing that a new response is 
necessary or prevent his realizing 
what the new response should be. 
Defective strength of motivation or 
conflicting motives might also cause 
structural rigidity. Cattell maintains 
that there is a third source of rigidity, 
which lies in “a resistance to change 
of neural discharge paths,” which 
will be “a basic attribute of all dis- 
positions,” and which cannot be al- 
tered by applying rewards or punish- 
ment. He equates this with “disposi- 
tion rigidity” which he considers to 
be the most interesting source of 
rigid behavior. However, in his 
opinion, all research on rigidity 
should control the three main causes 
of structural rigidity and relate any 
new discoveries to them. The neces- 
sity for studying all causes of a par- 
ticular piece of rigid behavior is very 
obvious in any field investigation. 

It seems likely that both process 
rigidity and structural rigidity will 
enter into learning difficulties, Men- 
tal inertia will probably be of small 
Importance compared with the ef- 
fects of having to modify a habit of 
long standing. This paper will not 
deal with process rigidity, since a dis- 
cussion of this may be found in 
Eysenck (1953). 

Many investigators have been in- 
terested in the rigid behavior which is 
a symptom of certain abnormal con- 
ditions. Goldstein (1943) deals with 
the two types of rigidity resulting 
from organic brain damage. “Pri. 
mary rigidity” is the inability of a 
patient to change from one train of 
thought to another. The patient pays 
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no attention at all to stimuli which 
are unrelated to the matter in hand. 
“Secondary rigidity” is displayed 
when a person is faced with a problem 
which is too difficult for him; he pre- 
fers an incorrect answer to making no 
response at all. It occurs particularly 
when patients are asked to deal with 
abstract things, which they can no 
longer understand. The main cause 
of rigidity in these patients would 
appear to be lack of g. However, 
Goldstein points out that lack of g 
due to brain injury may also cause 
“distractability’’ when the patient 
will light in turn on a number of 
small points rather than face the 
main problem. 

Werner (1946b) has shown that de- 
fectives display a different type of 
rigidity from those with brain inju- 
ries. The former fail to solve problems 
because they oversimplify them. The 
latter seize on resemblances to prob- 
lems which they have previously 
faced, and try to use methods which 
are no longer applicable. For Werner, 
rigidity is a functional rather than a 
structural concept, and multiform 
rather than unitary. He holds that 
in many cases differences between 
investigators stem from their differ- 
ent interpretations of the word 
“rigidity.” 

The Lewinian theory links the ex- 
istence of rigidity to the presence of 
strong boundaries between mental 
functions. As various boundaries 
may differ in strength in one indi- 
vidual no relationship can be ex- 
pected between one test of rigidity 
and another which involves a differ- 
ent function. The results explained 
in this way can, however, be ex- 

lained in other ways as Goldstein 
(1943) has already pointed out. Most 
experimenters have avoided the 
Lewinian use of the term, preferring 
to think of rigidity asa description of 
behavior for which further explana- 


tion is necessary, rather than as an 
explanation in itself. “Permeability” 
would be a better name for the prop- 
erty of the “mental boundaries” in- 
volved in the Lewinian explanation. 


TESTS 


The experimental work on rigidity 
has been carried out with the aid of 
a number of tests, each of which can 
be said to measure “rigidity” in its 
own right. But the relationships be- 
tween these tests are not always 
known, and where they have been in- 
vestigated, it seems as though more 
than one type of “rigidity” is in- 
volved. 

A brief description of the main 
tests will be given here; their rela- 
tionships with each other will be de- 
scribed more fully in the next section. 


Einstellung Tests 


Einstellung tests involve building 
up a “set” in the S and then giving 
him a problem which is best solved 
in some way other than the one he is 
expecting. 

The best known Einstellung test 
was devised by Luchins (1951a, 
1951b). The original version con- 
sisted of 10 problems in which the Ss 
had to discover a method of obtain- 
ing a certain quantity of water if 
given three containers of specified 
size. The first five problems could be 
solved in one way only and were 
meant to establish a set. The next 
two could be solved either in the set 
way, or by a shorter method. The 
eighth could be solved only by the 
second method and the last two by 
either method. The S's methods of 
solving the sixth and seventh prob- 
lems are said to give a measure of his 
susceptibility to set, and the last 
three problems show how well he 1s 
able to overcome the set. Modifica- 
tions of the test have included the 
introduction of an initial problem 
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which can be solved by the “long” 
or ‘‘short’’ method, alteration in the 
number of “set? problems given, 
omission of the problem which can 
only be solved by the short method, 
and using all types of “error” to form 
one rigidity score. 

Cowen, Wiener, and Hess (1953) 
developed a test of Einstellung rigid- 
ity using the same principles as the 
water jar test but involving a differ- 
ent type of skill. Each of the prob- 
lems of their alphabet maze test is a 
six-by-six letter square. The S has to 
move one box at a time from the up- 
per right hand corner of each square 
to the lower left hand corner spelling 
out words on the way. He may move 
in any direction as long as the move 
helps to spell out a word. In case 
more than one path is available, the 
correct solution is the one that uses 
the fewest number of boxes. 

Another form of Einstellung test 
was that of Rees and Israel (1935) 
who used anagrams. The scrambling 
order is the same for all the problems 
of the first series. Then anagrams are 
given which can be solved by using 
either the same order or another one. 
An alternative form of anagrams test 
first uses words which are all associ- 
ated with one topic, e.g. food or na- 
ture, and then gives anagrams having 
an alternative solution which is not 
connected with this topic. Experi- 
menters have usually had an intro- 
ductory series of 20 to 25 anagrams, 
and a similar number of “critical” 
problems. 

The water jar test of rigidity has 
received its most severe criticism 
from Luchins himself (1951b). Levitt 
(1956) in his recent review of the 
literature concerning Einstellung 
tests also covers many of the seven 
points described below. Firstly, users 
of the test assume that Einstellung 

rigidity can be equated with rigidity 
as displayed in behavior in the world. 
The criticism is true of all tests of 
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rigidity, since none of them have 
been measured against actual behav- 
ior. Secondly, a certain amount of 
ability at arithmetic is needed to do 
the tests at all; most experimenters 
have had to exclude a number of Ss 
who could not subtract correctly. 
Another point is that some people use 
the long method of solution even be- 
fore they have been “trained” by the 
set problems. In many experiments 
these people too have been excluded. 
The fourth—and greatest—fault is 
that problems with alternative solu- 
tions do not require the Ss to change 
the method they have been using all 
along. For these “trained” Ss, the 
“set” method may well be quicker 
than the new shorter method. A 
fifth serious criticism is tuat the Ss 
may have other reasons th{n rigidity 
of thought for their chgice of the 
“set” method when a simpler alterna- 
tive is possible, These -easons in- 
clude thinking that they aie meant to 
use the same method each time, and 
thinking that all three jats must be 
used to do each problem. Sixthly, 
the emotional impact of the words 
“testi” and “quickly” in the instruc- 
tions may cause the Ss to use meth- 
ods which they would not otherwise 
employ. Lastly, in scoring! the test, 
it seems apparent that susceptibility 
to set and ability to overcome it are 
different measures and should be 
kept separate. | 

The alphabet maze test is open to 
many of the same objections as the 
water jar test. Moreover in those 
problems which include two paths 
the transition probabilities between 
letters should have been made equal 
at the point of choice. For example 
the correct solution in one problem is 
“her hat” and the incorrect one “her - 
red suit.” After h,e,r, another r is 
more probable than an h, and the S's 
choice may be biased even if he is not 
aware of it. 


Tests employing anagrams have 
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the disadvantage that Ss might find 
the “nonrigid” solution, and go on 
looking for the rigid one. Unlike the 
arithmetic problems, there is no 
“better” or “best” solution to an ana- 
gram that can be solved in two or 
more ways. 

Though attempts to develop other 
Einstellung tests have been made, 
these three seem the most hopeful. 
It seems that a better measure of our 
chosen definition of rigidity would be 
obtained by cutting out “suscepti- 
bility to set” and dealing only with 
“the ability to overcome set.” The 
precautions suggested previously 
should be taken when any of these 
tests are to be used and there are 
other improvements which could be 
made. For instance, a more satisfac- 
tory scoring system than “number of 
non-Einstellung problems correctly 
solved” might be achieved. One pos- 


sible measure would be the time 


taken to solve each of these problems, 
compared with the time taken to 
solve the Einstellung problems. Such 
a measure would enable a number of 
Einstellung series to be given to one 
individual and ranged in order of the 
difficulty of the change required to 
solve the non-set problems. In this 
way, it would be possible to develop 
a rigidity scale. 


Concept Formation Tests 


Several concept formation tasks 
have been used to study intellectual 
rigidity. Examples are the Wisconsin 
card test, the Weigl card sorting test, 
the New York University card sort- 
ing test, and Buss’s wooden blocks 
test which he took over from the 
Vigotsky test. The Wisconsin card 
test will be described more fully. It 
consists of four stimulus cards and 64 
response cards. The stimulus cards 
contain one red triangle, two green 
stars, three yellow crosses or four blue 
circles. Each response card has on it 
one of four shapes in one of four col- 
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ors, and the figures appear from one 
to four times on the card. Thus it is 
possible to sort the cards according to 
shape, color or number under appro- 
priate stimulus cards. 

In all the tests, the general pro- 
cedure has been for the investigator 
to decide arbitrarily which variable is 
to be the basis for grouping and then 
to ask the Ss to discover it. Once the 
S has responded correctly a certain 
number of times, the critical variable 
is changed without informing him. In 
this way, he is forced to form a new 
concept or else to continue to fail by 
sticking to the one which was previ- 
ously successful. It has been found 
that some Ss never realize what is 
happening, while others have such 
clear insight that they can later de- 
scribe the experimenter’s plan of 
campaign. One difficulty in evaluat- 
ing the test is that a person who once 
realizes that the experimenter has 
changed the criterion has a good 
chance of recognizing further changes. 
In fact, he will become “set” to look 
out for changes. 

Concept formation plays an im- 
portant part in everyday life, and 
tests which utilize this function seem 
worth pursuing. Again, concepts of 
different levels of difficulty need to 
be studied and work is needed which 
will compare a person’s performance 
when he knows changes will occur 
with his performance when he is left 
to discover the changes for himself. 


Personality Tests 


Only one paper and pencil test of 
personality rigidity has been pro- 
duced so far. Wesley (1953) chose 
the items for this empirically, and 
then had them rated for degree of 
rigidity by five psychologists. The 
final test consisted of 50 items which 
all the judges rated as “high.” Zelen 
and Levitt (1954) maintain that a 
number of the items are duplicated 
and that the same reliability and 
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validity was obtainable from fewer 
items. They chose 12 which looked 
to them to be representative of the 
whole test and claimed that the 
shorter version correlated highly with 
the original version (r =.73). They 
also claimed that the reliability was 
at least as high as that of the original 
version (r=.74 for both). The short- 
ened form correlated to the same ex- 
tent (.38) as the longer one (.26) with 
the California Ethnocentrism Scale 
—the only comparison of validity 
made. The shortened form contains 
one item which could hardly be ap- 
plicable in all countries, i.e. “I never 
miss going to church,” but the other 
items appear to be “culture free.” 
This test in both its forms is of 
course open to all the criticisms gen- 
erally levelled at Paper and pencil 
inventories. Also no item analysis 
or analyses of the correlations be- 
tween items is reported, so that 
there is no evidence that the test is 
unidimensional, That Zelen and 
Levitt were able to choose a certain 
number to “represent” the whole 
test suggests that there may be 
clusters of similar items in the orig- 
inal version. There is, of course, no 
guarantee that the test would cor- 
relate with actual behavior, even 
though it may give a picture of the 
way individuals regard their own be- 
havior; this could be overcome to 
Bont cae oot Dyna Comparison, with 
friends’ ratings, since most of the 
items refer to instances of behavior 
rather than feelings. The test ur- 
tly requires factor analyzing and 
checking against a criterion for valid- 
ity, and the results so far achieved 
with it suggest that these tasks would 
repay the effort involved in carrying 
them out. 


Other Tests of Rigidity 


One test which has been chosen 
fairly frequently to study rigidity is a 
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perceptual task, the hidden objects 
test. The S is asked to find as many 
hidden objects as Possible in pictures 
similar to those often found in chil- 
dren’s magazines, Numerous hidden 
objects are Present and it is supposed 
that rigid Ss will have more difficulty 
in seeing them and so will produce a 
shorter list. Cattell (1946) found 
that this test had the highest loading 
of any on his factor of “disposition 
rigidity.” This type of test does not 
always correlate highly with the su- 
Perficially similar “Gottschaldt figures 
test” where certain outlines are dis- 
guised in complicated figures and the 
S is asked to pick out the simple out- 
lines, 


stereoscopic perception, 
rigidity. The effect of the lenses is to . 
“make a table appear to tip up like a 
drawing board or to make a wall lean 
towards the observer,” Becker no- 
ticed that People varied in the time it 
took them to see the distortion and 
in the degree to which they experi- 
e suggested that more 
“rigid” individuals who manipulated 
the world to conform to their own 
Preconceptians, would report less dis- 
tortion and take longer to see it. 
Learning effects did Hot occur after 
the first two trials, and both tife in- 
terval and degree of distortion ap- 
peared to remain constant from situa- 
tion to situation. Subjects: were 
asked to call out as soon as the dis- 
tortion occurred and to estimate how 
much higher the back of a plane sur- 


resent the distorted table top instead 
of giving a rough 
fects of expecting the distortion have 
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not been studied—just how much 
“learning” occurred during the first 
two trials, and why. The rationale 
behind the test is not clear; it can 
easily be argued that rigid people 
should see the distortion more quickly 
instead of less on the grounds that 
they should be less able to manipu- 
late their perception when this is de- 
manded of them. More work is 
needed on the test, but it is one in 
which there are possibilities, since it 
offers opportunity for objective meas- 
urement and comparison of one indi- 
vidual with another. 

Originally Becker noticed that 
children were more susceptible than 
adults to the distortion, and it would 
be interesting to discover whether 


further age differences exist among - 


+ adults. It would also be interesting 
‘to discover whether this type of per- 
«F teptual rigidity is related to set, as 
measured by Einstellung tests, and 
whether ease of change of other hab- 

. its, not” necessarily perceptual, is re- 
lated to ease of change of perception. 

The California Ethnocentrism 
Scale (Adorno, Frenkel-Brunswik, 
Levinson, & Sanford, 1949) has 
shown some slight relationship to 
other measures of rigidity and some 
experimenters have useg.tacial prej- 
udice as a rigidity tés¢ in its own 
right. However, though rigidity and 
ethnocentrism may well be related, 
there is no evidence that they are 
synonymous and assumptions of this 
nature must be viewed with suspi- 
cion. The California F scale has also 
been used, in this case the assumption 
being that authoritarianism and ri- 
gidity are closely related. 

The Rorschach test has also been 
used to get a measure of personality 
rigidity. In some cases the protocols 
have been rated by expert judges 
looking for previously decided signs 
of rigidity; in other cases some of the 
actual scores have been taken as pos- 
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sible measures. Total production, for 
example, or the ratio of “part pic- 
tures” to “wholes” are two of the 
measures that have been utilized. 
This test seems most suitable as a 
measure of “creative” rigidity; the 
blots impose no limits, and hence 
things such as imagination are bound 
to enter into the results. 


RELATION BETWEEN TESTS 


It is uncertain whether there is any 
generalized trait of rigidity. As Cron- 
bach (1956) puts it, “in those studies 
which reject a general factor the re- 
liability of the measures is uncertain. 
In those which find positive relations, 
test-taking attitudes may provide an 
adequate explanation.” Much ap- 
pears to depend upon the tests of ri- 
gidity used, the type and number of 
tasks involved, and the conditions 
under which testing was carried out. 

In describing the known relation- 
ships between tests, difficulties of 
classification have arisen. Few in- 
vestigations overlap; each experi- 
menter has tended to use a different 
combination of tests, including some 
unique ones. Sometimes, however, 
more than one of the well known 
tests already described have been in- 
cluded. In order to avoid mentioning 
the same investigation more than 
once, the order in which the tests 
were described in the previous sec- , 
tion has been used as “an order of 
precedence.” Investigations employ- 
ing Einstellung tests are, therefore, 
described first, then those using con- 
cept formation tasks and sowon. 
Under each heading, investigatic .s 
employing more than one major test 
are described before those using one 
major test and several other tests. 

A summary of the relationships is 
given at the end of the section. 


Einstellung Tests 
The water jar test was found by 
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Cowen, Wiener, and Hess (1953) to 
be related to the alphabet maze “set” 
test (7 =.42). They controlled varia- 
bles of age and intelligence and pre- 
sented the two tests in an exactly 
Parallel fashion to 59 college under- 
graduates. Neither of the test scores 
was distributed normally; both 
curves were U-shaped. It would have 
been surprising if, given a group with 
the ability to do the type of arith- 
metic and spelling problems involved 
in the two tests, the rigidity scores 
on the tests had not been related. 
Goldstein and Sheerer (1941) used 
the Luchins test, an anagrams test 
(adapted from Rees & Israel, 1935), 
the Shipley-Hartford Retreat Scale, 
and Thurstone attitude scales to- 
wards the Bible, Censorship, Patri- 
otism, and the Law, in order to look 
for social attitudes which could be re- 
lated to Einstellung rigidity. Sub- 
jects were 150 undergraduates. No 
significant intercorrelations were 
found between the two Einstellung 
tests, nor were there any consistent 
relationships between these tests, the 
Shipley-Hartford Retreat Scale and 
the attitude scales. It is surprising 
that the correlation between water 
jar and anagrams (the highest be- 
tween any of the tests) only reached 
r=.17, which is not significant at the 
.05 level of probability, The experi- 
. menters themselves Point out that 
their Ss may have supposed the ex- 
periments were intended to test 
speed or ability to use all the jars. It 
must also be remembered that the Ss 
were not volunteers and that two ses- 
sions at an interyal of a fortnight 
were used to administer the tests, 
Rokeach (1948) endeavored to find 
a factor of rigidity in ethnocentric 
people. He used the Luchins water 
jar test and the California Ethno- 
centrism Scale. He divided his Ss 
into high and low prejudice groups 
according to their scores on the Cali- 


“person may after all be) 
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fornia Ethnocentrism Scale, He 
called those who scored above the 
mean of his group the highly preju- 
diced and those who scored below it 
the low prejudice group. His 70 Ss 
were white American-born under- 
graduates. He found that the high 
prejudice group was more rigid, and 
more liable to think concretely, Us- 
ing a map test (rather like a maze 
test), he confirmed the result ob- 
tained with the water jar test. His 
work has been criticized by Luchins 
(1949) on the grounds of the subject 
matter of the Ethnocentrism test, the 
dichotomized results, and the purely 
verbal nature of the test, He also 
questioned the statement| that eth- 
nocentrism implies Tigitity and 
pointed out that the ethnocentric 
| to 
change. In his view, neitht Ehe UA 
of scratch paper nor incre, a 2 

TENS ; ased ver 
balization provide evidé ice for 
greater concreteness of thot, ht: Ro- 
keach (1949) answered th E cais 
cisms by admitting that nos 
centrism implies rigidity” is|a prem- 
Ise, and that social and ernotional 
implications existed in the test ma- 
terial. He maintained that the in- 
structions were formulated so as to 
make the Ss watch for “short” meth- 
ods. 
not a satisfactory method for meas- 
ing rigidity, but said that the, use of 
scratch paper is, The dichotomy of 
the prejudice scores obtained froma 
very selected group might well allow 
a spurious correlation to be found be- 
tween it and a test of rigidity which 
as we already know, gives a U-shaped 
distribution of results. 

Applezweig (1954) carried -Out a 
fuller investigation of the relation- 
ship between the water jar test and 
Rorsenaan, diei, These were the 

; ngyal perception 


= 3 en words t 
hidden objects test, and fhe mae 
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fornia Ethnocentrism Scale. Sub- 
jects were 79 candidates for a U. S: 
Navy submarine school. Since Ap- 
plezweig was also interested in the ef- 
fects of anxiety on rigidity, the Ss 
were divided into three groups. One 
was tested the day before, one the 
day after, and one a week after the 
candidates’ tests for entrance to the 
school. Her own words seem worth 
quoting here: “Of the 45 correla- 
tions among the six rigidity measures, 
22 are found to be negative, 21 are 
positive, and 2 are zero. Further- 
more, only three of these 45 correla- 
tions reach the .05 level of signifi- 
cance, and two of those three turn out 
to be negative correlations. There is 
with rare exception, no consistency 
between any two measures under all 
three conditions.” Applezweig sug- 
gests that the measures may not be 
reliable or they may be measuring 
different things. In fact, she goes on 
to explain the way in which she 
thinks that performance on the differ- 
ent tests may have been affected by 
the amount of stress in the situation. 
One test would appear more threat- 
ening in one situation than another. 
Hence the person would appear more 
rigid on it in the first situation. Of 
course, such arguments after the 
event are difficult to evaluate, unless 
they fit in with other findings. Ap- 
plezweig rightly concludes that she 
has “no evidence of generalized ri- 
gidity on six tests claimed to measure 
this trait,” but that she has some evi- 
dence that feelings of insecurity or 
anxiety will affect performance on 
tests of rigidity. Her work may be 
criticized on the grounds that not all 
types of rigidity test were included 
in her six. The results are more in- 
teresting, however, in that they con- 
tradict the findings of Rokeach and 
those of Cowen and Thompson now 
to be described. 

Cowen and Thompson (1951) tried 


a direct attack on personality varia- 
bles and rigidity. They gave the 
water jar test, the Bell and California 
personality inventories and the Ror- 
schach to 93 children in the eighth 
grade. On the basis of their scores 
on the water jar test, a rigid and a 
flexible group were distinguished. 
Those Ss who made mathematical er- 
rors and those who gave complex so- 
lutions to an initial control problem 
were eliminated, so that only 34 Ss 
were actually used. Of these, 17 were 
“flexible” and 17 were “rigid.” It was 
found that neither of the personality 
inventories were related to rigidity- 
flexibility. The Rorschach showed 
up certain differences between the 
groups in that the rigid group pro- 
duced fewer responses, were less able 
to organize and integrate the blots, 
and gave fewer ‘‘color responses.” 
The protocols were subjected to the 
ratings of judges for ‘‘over-all rigid- 
ity” and these were significantly re- 
lated to the rigid and flexible groups. 
In this investigation the Ss were a 
highly selected group of children who 
were good at arithmetic. It would 
not be safe to generalize on the basis 
of these results as Applezweig’s work 
in 1955 has shown. In fact, the re- 
sults of Rokeach, Applezweig, and 
Cowen and Thompson leave the rela- 
tionship between Einstellung tests, 
the ethnocentrism test and Rorschach 
test uncertain. 

Harway (1955) who worked on 
goal-setting behavior as an aspect of 
rigidity, suggested that rigid and 
nonrigid groups would behave differ- 
ently in a level of aspiration test. 
This followed from his acceptance of 
Goldstein’s concept of secondary 
rigidity. He thought that the rigid 
group might obtain either higher or 
lower scores on measures of varia- 
bility of goal-setting. The water jar 
test was used to measure rigidity, and 
the Rotter aspiration board, another 
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water jar task and a hidden word task 
were given as measures of level of as- 
piration. A rigid and nonrigid 
group were formed from the sixty-six 
subjects who successfully completed 
the water jar test. His results showed 
that “rigidity in problem solving 
seems to be related to greater varia- 
bility of goal-setting behavior” but 
there was a big overlap between the 
scores of the two groups. 

Levine (1955) tried to find a rela- 
tion between the water jar test scores 
of 100 male veterans and their time 
scores on a simple discrimination 
task, in which they had to say which 
of two figures was the larger. In 12 
out of the 22 cases, the figures were 
the same size. He found no relation- 
ship between the two scores, but as 
he points out it is only an assumption 
that a simple discrimination will 
measure impulsiveness or caution. It 
is also not proven that the water jar 
test measures “rigidity” in this 
sense. 

Guetzkow (1951) found no rela- 
tionship between water jar perform- 
ance and ability to solve the Maier 
pendulum problem by as many dif- 
ferent methods as possible. This was 
a good illustration of the fact that the 
ability to solve a given mathematical 
problem may be different from that 
required to seek out alternative solu- 
tions to a problem. In the one case, 
the S knows the end towards which 
he is working; in the other, the num- 
ber of possible solutions is not fixed, 
Guetzkow also showed that “suscep- 
tibility to set” and “ability to over- 
come set” are two different things; 
that men and women are equally sus- 
ceptible to set, but that men appear 
to be better able to overcome it. 

Wolpert (1955) used five rigidity 

tests which were all supposed to be 
reliable, and sought for a general 
factor among them, They were the 
water jar test, the applicability of ad- 


jectives to self and friends, the appli- 
cability of adjectives to various social 
situations, narrowness of attitude to 
the world, and the stability of repro- 
duction of lines. The tests were given 
to 38 volunteer students, and com- 
parisons were carried out between 
their rank order rigidity scores. No 
relationship between the tests proved 
significant, and Wolpert therefore 
concluded that there is no generalized 
factor of rigidity. His choice of tests 
may be questioned: stereotypy of 
description may not indicate rigidity 
of character or of intellect. His scor- 
ing of the water jar problem is also 
open to criticism. Does “the number 
of old method solutions to those prob- 
lems solvable by a shorter method 
plus the number of failures to solve 
those problems where the old method 
is completely inappropriate,” really 
constitute a satisfactory measure of 
rigidity? Other workers consider 
with good reason that “susceptibility 
to set” and “ability to overcome set” 
are two different things. It looks as 
though Wolpert’s conclusion that 
“rigidity is not a generalized factor,” 
takes too much for granted. 

Oliver and Ferguson (1951) car- 
ried out a factor analysis of a number 
of so-called rigidity tests, from which 
they found three factors. One of 
these was “habit interference or rigid- 
ity.” The tests which were loaded 
highly on this factor were substitu- 
tion of the normal meaning of arith- 
metic signs, giving the letter so many 
behind a given one in the alphabet, 
giving the opposites of the seasons or 
not, according to the printing, figure 
analogies and a same-opposites test. 
Those not included in the factor were 
rewriting mirrored words in the nor- 
mal way, anagrams, a number series, 
the Gottschaldt figures test and the 
Luchins water jar test. The tests in- 
cluded in the factor all need the re- 
versal of a previously over-learned 
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habit, and, therefore, it seems possi- 
ble that the factor is one of “disposi- 
tion rigidity.” It is of interest that 
at least two common Einstellung 
tests of rigidity, anagrams and the 
Luchins water jar, are not included in 
the factor. 

This is, perhaps, the most useful of 
all the investigations employing the 
water jar test, since it has suggested 
that Einstellung tests may differ in 
kind from other tests of rigidity. It 
is the more unfortunate that the bat- 
tery did not include a test known to 
be high in g and some of the more 
commonly accepted “disposition ri- 
gidity” tests. 

Pitcher and Stacey (1954) used the 
Guilford-Zimmerman temperament 
survey, and an Einstellung test of 
similarities between words, in their 
attempt to see whether rigidity could 
be described as a general personality 
trait. The similarities-between-words 
test was so designed that the first 
seven items should have elicited con- 
crete or functional similarities, the 
next two, abstract similarities, while 
the last four could be answered in 
either way. The 373 Ss were divided 
into seven groups, ranging from those 
who gave the most rigid—that is the 
greatest number of functional sim- 
ilarities—to least rigid or most ab- 
stract. The scores on the 10 traits of 
the Guilford-Zimmerman scale were 
compared by analysis of variance to 
see whether there were any differ- 
ences related to rigidity. Two sig- 
nificant differences were found, on 
Ascendance and Masculinity. The 
latter varied according to the respec- 
tive numbers of male and female Ss 
within the seven groups. The main 
contributions to the significance of 
Ascendance came from the least 
rigid, who scored more highly on this 
trait. Pitcher and Stacey point out 
that this agrees with Cattell’s find- 
ings that the more rigid are the more 
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submissive. However, one should not 
generalize from disposition rigidity 
tests to Einstellung tests in this way, 
without further direct evidence that 
the two are related, particularly in 
view of Oliver and Ferguson’s work 
in 1951. The results were obtained 
from one test which had not been 
studied in relation to any other tests. 
The results “do not support the hy- 
pothesis of a generalized rigidity 
factor,” but neither do they refute 
this hypothesis. 

French (1955) gave the water jar 
test, California F scale, a changing 
figures test, preferences for design 
and two closure tests to 50 airmen 
under stress and 50 under nonstress 
conditions. She concluded that ego- 
involving conditions did not produce 
an increase in rigid behavior and 
that there was no evidence of gen- 
eralized rigidity among the tests. 
Only the two closure tests were sig- 
nificantly related under both stress 
and nonstress conditions. 


Concept Formation Tests 


Wesley (1953) gave a card-sorting 
task requiring change of concepts 
without warning to Ss chosen accord- 
ing to their scores on the Wesley ri- 
gidity inventory and the Taylor 
Manifest anxiety scale. There were 
21 rigid Ss, 21 anxious Ss, and 30 nor- 
mal Ss. All took equally long to learn 
the initial concept, but the rigid 
group took significantly longer to 
shift its set (P=.04). The rigid 
group also made significantly more 
perseverative responses than the 
anxious group (P= (02), and the nor- 
mal group (P =.05). The rigidity 
scores were, by the nature of the 
sampling, not related to scores on the 
Taylor Manifest Anxiety scale, and 
the scores on the latter did not seem 
to show any clear relationship to the 
sorting test. From these results it 
seems that it is possible to find rigid 
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people who are not anxious and anxi- 
ous people who are not rigid, though 
Gaier (1952) found that the two went 
together, at any rate at the time of 
testing. 


Wesley Scale 


The Wesley inventory was also 
used by Zelen and Levitt (1953) in an 
attempt to find out whether their 
group level of aspiration test could 
be regarded as a measure of rigidity. 
The other tests were the California 
Ethnocentrism Scale, and the Rotter 
board. All three measures proved to 
be significantly related to their test, 

Jones (1954) measured the scores 

on the California F Scale against the 
rate of fluctuation in the Necker cube 
illusion, the Guilford-Zimmerman 
Temperament Survey, and the Wes- 
ley Manifest Rigidity Scale. There 
was a significant negative correlation 
between the California F Scale and 
the rate of fluctuation of the Necker 
cube; a significant Positive correla- 
tion between the F scale and two 
anxiety and two hostility scales on 
the Guilford-Zimmerman survey; 
and a significant positive correlation 
between the F scale and the Wesley 
Manifest Rigidity Scale. 


Hidden Objects 


Pullen and Stagner (1953) adopted 
Cattell’s definition of rigidity and 
used four of the tests suggested by 
him in order to measure it. These 
were a motor test of creative effort, a 
hidden objects test, riddles anq 
flicker fusion. Hidden words were 
also used, and new tests incorporated, 
These were naming colors finely 
graded from one shade to the next, 
naming changing figures, and naming 
changing pictures. The Wechsler. 
Bellevue was included in the battery 
as a reference test. The rigidity score 
for the last three tests mentioned was 
the number of instances in which the 


Ss gave conflicting opinions when the 
colors, figures, or Picture series were 
started at different ends of the con- 
tinuum. Subjects were 60 psychotics, 
35 of whom received convulsive shock 
therapy. Two factors—‘rigidity” 
and “‘intelligence’—were found on 
centroid analysis. The changing 
colors, figures and creative effort sub- 
tests were the most heavily loaded on 
the rigidity factor while hidden ob- 
jects, hidden words, riddles, and lack 
of flicker fusion overlap contributed 
mainly to the intelligence factor. 
After shock therapy, the rigidity 
scores decreased. This rigidity factor 
seems likely to be the same one that 
Cattell called “disposition rigidity,” 
although not all Cattell’s tests were 
loaded on their rigidity factor, This 
is, in fact, one of the more disturbing 
factor analytic studies because of the 
high negative loading of four “Tigid- 
ity” tests on intelligence, 

Cattell (1946) himself found that 


it so that it became unfamiliar, were 
heavily loaded on the disposition 
rigidity factor, When he gave the 
same 100 Ss his 16-personality factor 
test, he found that this type of rigid- 
ity was related to Submissiveness, 
Character Integration, and, to a 
lesser extent, Stability and Surgency. 


Aniseikonic Lenses 


Becker (1954) using aniseikonic 
lenses, found that time delay in seeing 
the distortion and the degree of dis- 
tortion experienced were related to 
certain Rorschach measures, among 
them the weighted scale developed 
by Fisher (1950). 

Martin (1954) also experimented 
with aniseikonic lenses and found 
that those quick to Perceive distor- 
tion asked few questions in an am- 
biguous interview situation, This 
Suggests that asking few questions 
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may be a measure of flexibility and 
ease of adjustment rather than of 
rigidity. 

Ethnocentrism 


Using the California Ethnocen- 
trism Scale and an unpublished Toler- 
ance of Ambiguity Scale consisting of 
questions about change and per- 
severance, which was devised by 
Walk, O’Connor (1952) found that a 
complex relationship existed between 
these two and ability to carry out 
syllogistic reasoning. Intolerance of 
ambiguity was related to poor ab- 
stract reasoning only when it was 
associated with ethnocentrism. 

Rokeach (1951) found that ab- 
stract definitions of religion and poli- 
tics were least common in the most 
racially prejudiced. On the other 
hand, Erikson and Eisenstein (1953) 
found that ethnic prejudice was not 
related to the number of hypotheses 
a person would put forward to am- 
biguous stimuli. Thus, though ethno- 
centrism may be related to formalized 
reasoning, it does not appear to be 
related to what might be called “crea- 
tive reasoning.” This agrees with the 
results found to hold for the water jar 
experiments. 


Rorschach 


The experiment by Eriksen and 
Eisenstein (1953) included an at- 
tempt to get the S to accept certain 
concepts in the Rorschach. They 
found that the tendency to reject 
these concepts was inversely related 
to the number of hypotheses a person 
would offer to an ambiguous stimulus. 
In some ways, this finding is surpris- 
ing; one would not expect that readi- 
ness to accept the concepts offered 
would go with an ability to produce 
numerous concepts on demand. In- 
deed, what might be called ‘‘passive’”’ 
tests of rigidity have not in other in- 
stances linked up to the “‘creative”’ 


ability to seek out solutions or pro- 
duce ideas. 

Gaier (1952) has followed up this 
line of reasoning by looking for rela- 
tionships between the Rorschach 
which he used to gauge anxiety, rigid- 
ity and negativism, and the kind of 
thinking (‘‘rote’’ or “thought”’) pres- 
ent in a learning situation. His Ss 
were 11 college freshmen of approxi- 
mately equal ability who were inter- 
viewed about their reactions during 
class while hearing a recording of that 
class. Their first year examination 
performance was analyzed according 
to the type of cognitive processes they 
had used. Their Rorschach protocols 
were examined and ranked by two 
experts for anxiety (“whether self en- 
grossed and concerned with personal 
adequacy”), rigidity (F+ and A re- 
sponses) and negativism (white space 
responses). Anxiety was found to be 
significantly but negatively related to 
performance when comparing famil- 
iar and unfamiliar material (rho 
=—.61) and positively with the 
amount of thought given to the self in 
class (rho = .60). There was a positive 
correlation between high rigidity rat- 
ing and rote. recall (rho=.73) and a 
negative correlation between high 
rigidity rating and problems or ideas 
requiring new methods of attack. 
Gaier concludes that ‘Anxiety and 
rigidity characterize those individuals 
less capable of improvising in a new 
problem situation.” Of course there 
could be other reasons for the lack of 
creative thought, such as low ability, 
or low fluency of thought, and Gaier’s 
work gives no clue as to which comes 
first, rigidity or anxiety. 

The number of white space re- 
sponses on the Rorschach was also 
used by Bandura (1954) who found it 
to be related to the rate of fluctuation 
of the Necker cube—a favorite test of 
sensory perseveration if not of 
“rigidity.” : 
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Fisher (1950) attempted to use a 
“rigidity profile” consisting of the 
Vigotsky block test, change in level 
of aspiration during a hand steadiness 
test, liking for colors, trait judgments 
from photographs, comparison of self 
with pictured persons, number of in- 
terpretations to TAT pictures, num- 
ber of things found annoying, range 
of interests, Rosenzweig picture frus- 
tration test, and blot representation 
test. Against this profile he used the 
Rorschach. His first measure was of 
“personality rigidity,” diagnosed by 
such signs as restricted responses and 
limited use of color. His second meas- 
ure was of “personality maladjust- 

ment” as shown by various “clinically 

recognized signs.” He also used Guil- 
ford’s S.T.D.C.R. test, to get “a uni- 
form means of analyzing an Ss way of 
describing herself’ —that is, a meas- 
ure of the Ss favorable or unfavorable 
descriptions of herself, There were 60 
female Ss, divided into normal, 
paranoid, and hysteric groups. Very 
little overlap occurred between the 
subtests of the rigidity profile and 
there was no consistent overlap be- 
tween the different subject groups. 

This is hardly surprising, in view of 

the assorted mixture of the tests, 

some of which would not be generally 
recognized as measuring rigidity at 
all, but rather narrowness of outlook. 

He found that his rigidity profiles 
did correlate with the Rorschach 
measure of rigidity, and with the “un- 
favorable” S.T.D.C.R. score, and 
that in general, the more disturbed 
the patient, the higher her “rigidity” 
score. He makes the point already 
mentioned by Goldstein that rigid 
people in one situation may react by 
extreme flexibility in another and 
that it may be that “personality 
rigidity may manifest itself in diverse, 
often unrecognized ways.” He sug- 
gests that three levels of emotional 
stress are involved in his tests. Some, 
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like the Rorschach, cannot be an- 
swered without a great deal of emo- 
tional impact, while others, such as 
liking for colors, contain no emotional 
significance. Nothing is done to show 
whether this analysis is consistent 
with the data. 

In view of the assortment of tests 
used, it may well have appeared that 
no single one of these measures is suc- 
cessful as a measure of general rigid- 
ity, but Fisher was not justified in 
stating on the basis of his results that 
the concept of generalized rigidity is 
largely fictitious. 


Flexibility in Thinking 


In a factor analytic study of flexi- 
bility in thinking Guilford, Frick, 
Christenson and Merrifield (1957) 
found two flexibility factors called 
“spontaneous flexibility” (i.e. free- 
dom from inertia in thinking) and 
“adaptive flexibility” (i.e. restruc- 
turing of interpretations and ap- 
proaches). These factors are not 
thought likely to be the same as any 
psychomotor rigidity factor, but per- 
ceptual flexibility seems to be a form 
of adaptive flexibility. 

Thirty-two test scores were ob- 
tained from 28 tests, administered to 
208 air cadets. The test battery took 
six hours to complete. Eleven of the 
tests were new, and five were “ref- 
erence tests.” The tests which had 
loadings of above .30 on spontaneous 
flexibility, and on this factor only, 
were: 

“Brick uses’—think of as many 
different uses as possible for a com- 
mon brick. 

“Object naming’’—write as many 
objects as possible belonging to a cer- 
tain class, 

“Unusual uses’’—list other uses for 
a common object for which a stand- 
ard use is given. 

“Impossibilities’—list all the im- 
possibilities that can be thought of. 
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Tests having loadings above 30 on 
adaptive flexibility only were: 

“Match problems’’—remove 4 
specified number of matches from a 
given design to leave another named 
design, by several methods. 

“Penetration of camouflage” —find 
human faces hidden in a series of pic- 
tures. 

“Squares’’—place a specified num- 
ber of crosses on a checkerboard so 
that no two are in the same row, 
column or diagonal—give alternative 
responses. 

Of these tests, only the “hidden 
faces” has any parallel with those 
previously discussed. 

Tests which have appeared in 
other research and which appear in 
this study in other factors than flexi- 
bility are: 

Hidden words (verbal comprehen- 
sion and structural redefinition). 

Water jars (reasoning and logical 
evaluation). 

Sign changes in arithmetic (reason- 
ing). 

Riddles (originality, if “clever,” 
and ideational fluency, if ‘‘obvious’’). 

It may well be that this distinction 
between spontaneous and adaptive 
flexibility in thinking is as im- 
portant as that drawn by Cattell 
between process rigidity and struc- 
tural rigidity in psychomotor per- 
formance. The distinction has al- 
ready been drawn more than once in 
this article between situations which 
impose a bounded area for adaptation 
and those which give a limitless 
choice and which seem to require 
some sort of creative ability. Adap- 
tive flexibility seems to resemble the 
former case and spontaneous flexibil- 
ity, the latter. 


Summary and Discussion 


It should be noted that 47 tests 
havebeen mentioned, and that the 
overlap between experiments has 
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been extremely small. Where two 
people have used the same two tests, 
their results hardly ever agree and it 
is hard to say whether this is due to 
faults in the tests or discrepancies in 
the conditions, administration, and 
scoring of the tests. 

Perhaps the most obvious need in 
the whole field is for a study of the 
relationship between the well-known 
tests of rigidity—an Einstellung test, 
concept formation test, personality 
test, perceptual test, and disposition 
rigidity tests. However, the writer, 
like previous investigators, feels that 
the design and scoring of some of 
these tests should be altered. Even 
if no new tests were introduced into 
such a battery it would not be com- 
parable with previous batteries. It 
would be consistent within itself in 
that each test would seek to measure 
the difficulty of change of habits in the 
face of new demands to change. 

Cattell’s warning that each new 
aspect of rigidity studied should be 
linked to the factors already known 
should be remembered; for example, 
a measure of intelligence should be 
included in any battery to serve as an 
anchor for a factor analysis. 


EXPERIMENTAL WORK 
Training 


Many workers have tended to 
ignore the quest for a general factor 
of rigidity. They have concentrated 
on situations which elicit rigid be- 
havior and the methods by which 
such behavior may be modified. The 
first group of experiments described 
here deals with the effects of different 
“training experiences.” When a per- 
son has built up a concept as a result 
of his own experiences, under what 
circumstances and in what ways can 
he alter it most easily? 

Type of alteration achieved. Several 
experiments have been carried out on 
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he e of alteration which is most 
ence aed: Walk (1952) used 
wooden blocks of different size, shape, 
and color, and later photograghs of 
various facial expressions and poses 
in a concept formation test. He gave 
a third of his Ss training in one prin- 
ciple, a third training in the opposite 
principle, and a third no training. 
The first principle was then used for a 
criterion run. The group which had 
to reverse the principle learned dur- 
ing training gave the worst perform- 
ance on the criterion test. The experi- 
ments show that, as might be ex- 
pected, it is harder to alter a concept 
completely than it is to use the orig- 
inal one, or than it is to develop a new 
one. No tests were given in which 
training was of such a nature that the 
concept had to be partially altered on 
the criterion run. 

Buss (1950, 1952, 1953a, 1953b) 
gave college students a concept for- 
mation task which involved block 
Sorting. He found that concept re- 
versal was easier than having to de- 
velop an unrelated concept or one 
Overlapping with the original vari- 
able. Buss Suggests that discrimina- 
tions of this type are learned by the 
differential reinforcement of the vari- 
ous stimuli. The number of stimuli 
of one kind Presented in th 


in the criterion series both j 
the success of the Ss, 
Kendler (1953) carried out similar 
experiments using the New York 
University card sorting test, and ob- 
tained similar results, However, he 
attempted to test out his own ex- 
planation of the easier reversal shift, 
This is that concept formation sorting 
tests depend on the ability to ver- 
balize the concepts. Those easier to 
verbalize will be learned more quickly. 
His crucial experiment involved omit- 
ting all the cards in the learning of the 
second concept which would give 


partial reinforcement to the first con- 
cept. However, there was no signifi- 
cant difference between the reversal 
and nonreversal groups, and it proved 
difficult to evaluate the results since 
the correct sorting for both reverse 
shape and reverse color were the 
same. . 

The important thing about this 
work is that Buss and Kendler ob- 
tained similar results using different 
materials and working independently. 
In the formation of concepts, it seems 
that direct reversals of concepts al- 
ready known are easier to learn than 
partial alterations. This may have a 
possible application in the field of 
learning and retraining, 

Effect of reinforcement, The effects 
of the type of reinforcement given 
during learning is not at all clear, 
Buss, in the work already quoted, 
found that continuous reinforcement 
led to less Teadiness to reverse the 
Previously learned discrimination, 
However, Sheffield (1949) worked 
with rats and found that the rate of 
extinction of alley running habits was 
greater for those whose responses had 

een reinforced all the time. In other 
words, they appeared to give up their 

abit more readily if it had been con- 
tinuously reinforced and then sud- 
denly not reinforced at all, 

s It seems that “reversal” and ‘“'ex- 
tinction” are different Phenomena, 
but that continuous reinforcement 
leads the Ss to feel very sure of the 

rightness” of the choices they are 
making, 

Effect of criterion run. The length 


of the successful criterion run also ap- 


lingness to 


7 


RIGIDITY—A FLEXIBLE CONCEPT 211 


perimenter altered the correct sorting 
category without warning. A similar 
result was found by Kendler; those 
with the greatest number of correct 
responses dropped the response most 
quickly. On the other hand, they also 
proved quicker at relearning the same 
concept later. It seems that those 
who are surest of their concept in the 
first place are the quickest to realize 
that the concept has been changed 
and that they need to abandon the 

idea which was previously correct. 
Effect of “set.” In situations which 
involve “set” however, those who 
have become most practiced in the 
“set?” method are often least willing 
to abandon it. The number of arith- 
metic problems given by Youtz 
(1948) to form the “set” ranged from 
five to 40. He then introduced a new 
series of problems which were in- 
soluble by the “set”? method. He 
found that the greater the number of 
uset” problems the greater the time 
taken on the new type of problems. 
No data were given on the rate of 
increase of speed on the new prob- 
lems, and it seems likely that “in- 
sight” occurred fairly early in the 
“test” series. Maltzman and Mor- 
risett (1952) showed a similar effect 
to be present in the solving of in- 
compatible anagrams. On the other 
hand, Tresselt and Leeds (1953a) dis- 
covered no difference in the number 
of criterion problems solved when the 
number of “set” problems was in- 
creased beyond a certain limit. In 
their experiment, the best number of 
“set” problems proved to be between 
six and eight. It seems possible that 
a break in the series may have a 
definite effect here, in some way 
breaking the habit of response more 
for less practiced responses. Or again 
it may be that time taken is a more 
sensitive measure than number solved 
beh RO if it is a question 
realizing that a new 


method of solution is needed. 

Both Rees and Israel (1935) and 
Hunter (1956) have shown that 
specified sets help in solving anagram 
problems. The more specific the set, 
the more quickly their Ss solved the 
anagrams. Weaver and Madden 
(1949) concluded that in solving 
problems such as Maier’s “pendu- 
lum,” Ss used only directions of 
thought and habits of search with 
which they were already familiar. 
Hints about “direction” and “parts” 
of the solution enabled more Ss to 
solve the problem. 

Effect of practice. The type of prac- 
tice seems to influence the degree of 
rigidity shown in problem solving, and 
the degree to which a ‘‘set”’ is devel- 
oped. For example, Kendler, Green- 
berg, and Richman (1952) showed 
that massed practice rather than 
distributed practice favored the de- 
velopment of a “set” in solving arith- 
metic problems. Half their Ss were 
given the problems under massed 
practice conditions with no interval 
between the solution of one problem 
and the presentation of the next, 
while the other half had a three min- 
ute interval between successive prob- 
lems. The strength of the set was 
measured by 4 test problem which 
could be solved by either the set Or 
nonset method. Sheffield (1949) 
working with alley-running rats, 
found that the extinction rate was 
dependent upon the reinforcement in 
massed training but not in spac 
training. Kendler (1953) found that 
mental sets were more effectively 
weakened when they were tested 
under massed practice conditions. 
His first 20 anagrams were all food 
words, all solvable in the same way- 
They were presented to three groups 
of 10 Ss at 15-second intervals. Dur- 
ing the test series, the Ss were given 
nature words not solvable in the 
uset? way, one group with no in- 
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terval between solution and presenta- 
tion, and the second group with an 
interval of 30 seconds. Over the first 
sixanagrams of the test series, the first 
group had decidedly fewer failures 
than the second group. After this the 
difference decreased. Taken together, 
these experiments suggest that 
massed practice favors the develop- 
ment of set but that massed practice 
during testing or during extinction 
tends to favor extinction. Such find- 
ings do not necessarily apply to the 


solving of problems other than these 
short ones involving set,” 


students into three g 
them a verbal reasoning problem, 
The first group received massed prac- 
tice for 20 minutes, the second re- 


ceived practice of 60 seconds work 


Preparation for problem-solving, 
There has been a good deal of dis- 
cussion as to the best Way in which 


roblem solving 


proficient in solving new 
the same class than a gr 
on a number of differen 

he latter, however, had a smoother 
transition to new problems than the 


learning sets, or ide 
lems should be tack 


training actually widens adaptability. 
Schroder and Rotter (1952) agrèċe 
with this. They used a card sorting 
task with four groups of Ss and they 
altered the training in expectancy of 
change given from group to group. 
Flexibility, they say, consists of ex- 
Pecting changes and looking for al- 
ternative pathways all the time. 
Thus it is training in expectancy of 
change which is required, and not 
training in one single solution satis- 
factory at the moment. Just giving 
no instructions is not enough to pre- 
vent a “set” being built up: Postman 
and Leytham (1951) found that a se- 
lective set may be induced either by 
instructions or through experience on 
the task. 

Again, the proba! 

eing correct was found to influence 
the solutions to matching Problems 

y Goodnow and Postman (1955), 
In this task, the Ss were not told that 


the Probabilities varied or what they 
were, 


Limited p 
itself have a harmful eff, 
Problems are fa 
strated that inability to use an object 
or a “strange” Purpose may be due 
to the previous use made of the ob- 
Ject. Birch and Rabinowitz (1951) 
also showed this effect, and three of 


1 Taylor (1954) 
showed that this “functional fixed- 
ness” is related to inability to over- 
Come set on the Luchins Water jar 
test (i.e. the extinction Problems), but 
not to Susceptibility to set. They 
also found that the longer the time 
interval between the pri 
“fixed” article, the more 

, chance 
there was that the ¢ Would Overcome 
in its new con- 
as shown that 
å . Seconds between 
Presenting an Einstellung Problem 


= 
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and allowing the S to write down the 
answer cuts down the number of 
‘rigid” answers received. This points 
to the importance of allowing time to 
think—a factor which may normally 
depend either on the imposed phys- 
ical situation, or on some personality 
variable. However, using intervals of 
one to seven days between practice 
and “critical” problems seemed to 
increase the susceptibility to set in 
an experiment by Tresselt and Leeds 
(1953b). Here, obviously, factors of 
memory are influencing the situa- 
tion. 

Summary and discussion. The ex- 
periments just described have dealt 
with the effects of previous experi- 
ence. 

It has been shown that reversing a 
previous concept is easier than par- 
tially altering it and that both these 
are, of course, harder than sticking to 
the original concept. 

It has been demonstrated that 
specificity of set, or training in a par- 
ticular solution, leads to maximum 
efficiency so long as problems are con- 
fined to ones which require that solu- 
tion. Where a variety of problems 
have to be faced, a wider training 
with emphasis on the need for change 
is advisable. In this connection, the 
effects of limited experience and the 
previous use of an object in one con- 
text only have been illustrated, and it 
has been suggested that solutions to 
problems are, in fact, evolved from 
previous experience rather than 
thought out from first principles. 

The experiments have not settled 
all the problems that they have 
tackled. For example, the effects of 
different types of reinforcement are 
not entirely clear. In most of the ex- 
ie reinforcement given all 
of LER appears to make Ss surest 
AT Sa e _When the experi- 
will d anges his criterion, the Ss 

rop the concept most quickly 


and also relearn it most quickly if 
their responses have been reinforced 
all the time. But Buss found that 
Ss given 100% reinforcement while 
learning a concept appeared to be 
less ready to change it. It is not clear 
why this should be so, and further 
work is needed on this point. 

Again, the relative effect of massed 
and distributed practice on “set” 
problems is not certain. It seems 
from the experiments quoted that 
massed practice favors the develop- 
ment of “set,” yet it also tends to 
favor extinction later. The important 
variable here seems to be the point at 
which the massed practice is experi- 
enced. 

Similarly the effect of the length of 
the series of “set” problems in Ein- 
stellung tests is not clear. In some 
experiments, the longer the series, 
the greater the difficulty of change. 
In other experiments there appeared 
to be an optimum number for the 
“uset” series. The conflict here seems 
to be due to the different methods of 
measuring “difficulty of change” and 
could probably be resolved fairly 
easily by the use of a “time for solu- 
tion” criterion instead of a ‘‘correct 
solution” criterion. 


State of the Subject 


Anxiety. The degree of rigidity in 
problem solving depends partly on 
the state of the S at the time, and 
anxiety is one factor which has been 
investigated. Some experimenters 
have induced anxiety by giving dis- 
turbing and false details about their 
Ss’ personality test results. Cowen 
(1952a, 1952b) used this technique. 
A variation in this procedure is to 
build up feelings of ego involvement 
in the task. Marks (1951), Osler 
(1954), Beier (1951), and Brown 
(1953) have used such methods. 
When anxiety was aroused, the degree 
of rigidity shown was found by Brown 
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to be related to ‘authoritarianism. 
Other investigators, such as Maltz- 
mann, Fox, and Morrisett (1953) 
have examined the relationship be- 
tween Einstellung rigidity as meas- 
ured by the water jar test and an 
anagrams test and generalized anx- 
iety as measured by the Taylor 
Manifest Anxiety Scale. They found 
that the tendency to shift from the 
“set” solution was less for those with 
high anxiety scores. Applezweig 
(1954) found that the anxiety in- 
duced during the selection of sub- 
marine candidates affected their per- 
formance on rigidity tests, though it 
did not in all cases increase the 
amount of rigidity displayed. 

Ross, Rupel, and Grant (1952) 
gave the Wisconsin card sorting test 
under four different conditions and 
found that only irregularly given 
electric shocks affected the Ss’ ability 
to form and change the correct sort- 
ing concepts. The other conditions 
of testing were an impersonal situa- 
tion, personal heckling, and imper- 
sonal auditory distraction. The 
“shock” situation is the only one of 
these which seems likely to have 
caused anxiety, 

Needs. Reaction to “needs” such 


as thirst or hunger have been found 


to be related to certain types of 


havior by Klein 
man and Crutch- 


lein demonstrated 
that those who could ignore a color 


word when naming another color jn 
which the word was typed, also gaye 
less thirst-associated words in re- 
sponse to a thirst cue word when 
they were actually suffering from 
thirst. Postman and Crutchfield 
demonstrated that the relationship 
between intensity of hunger and fre- 
quency of food responses depended 
on their Ss’ selective set for such 
responses. Both investigations found 
that needs could influence set and 
rigidity, 
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Sensory deprivation. McAndrew’s 
work (1948) on the deaf and blind 
should be mentioned here, though he 
uses the Lewinian concept of rigidity 
which means that his findings must 
be interpreted with caution. He 
worked with 25 deaf, 25 blind, and 
25 normal children. He found that 
the deaf and blind took longer to 
become tired or “satiated” with sim- 
ple modelling tasks and that the deaf 
had more difficulty in restructuring 
in classification tests. Deaf children 
more often rejected Rorschach cards 
than normal children and they gave 
more “all or none” reactions to a 
level of aspiration task. From this he 
deduced that both deaf and blind 
children displayed more “rigidity” 
than normal ones, and that rigidity 
is a positive function of isolation. De- 
Prived children Probably have nar- 
rower worlds and more limited vo- 
cabulary than normal ones and this 
lack of experience might well account 
for their poorer reclassifications and 
responses in tasks which require 
verbalization. 

Age. Kounin (1941) and Werner 
(1946b) have both investigated rigid- 
ity in children, Kounin came to the 
conclusion that it increased with age, 
and Werner that it decreased. Kou- 
nin holdsthe Lewinian view of rigidity 

e contrasted normal children with 
feeble-minded adults of the same IQ. 
Thus there is some doubt whether age 
or feeble-mindedness is the chief vari- 
able here, 

Bromley (1953) dealing with aged 
adults, found that they were unable 
to shift or hold two aspects of the 
same pattern simultaneously, In re- 
sponding to Raven's Progressive 


personal as- 
al ideas, and more 

concrete answers. 
may be called more 


N 
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that 22 Ss between 58 and 73 were 
unable to shift concepts when the 
experimenter altered his criterion 
without warning. What is more, if the 
Ss did not at first hit on the same 
criterion as the experimenter they 
seemed unable to explore the other 
possible criteria. Among a group of 
college students, On the other hand, 
15 were successful and verbally recog- 
nized that the experimenter had 
shifted, 21 were successful though not 
sure what the experimenter had done, 
and 15 failed to shift or to recognize 
the experimenter’s actions. It seems 
that in this sort of activity, older 
people may be less good than younger 
ones—but they might have been able 
to perform as well if told to look for 
changes. Some support for this idea 
is provided by Heglin (1956). He 
studied the performance of different 
age groups On the water jar and al- 
phabet maze Einstellung tests, both 
with and without training. Training 
consisted of instructions to look out 
for and avoid the influence of a set 
approach. He found that before 
training the older Ss showed more set 
than the younger ones. After train- 
ing, however, his middle-aged group 
showed the least amount of set, the 
youngest group came next and the 
oldest group did worst. He makes the 
important point that the older the 
group of Ss, the longer the time taken 
to solve the problems both before and 
after training. 

The complexity of the problems 
used has been found to affect Ss dif- 
ferentially according to their age. 
Thus, Clay (1954), using groups of 
64 Ss under 25 and 64 over 55, showed 
that the performance of both groups 
Mies similar on the simplest task but 
ere ep increase in complexity 
ieee Sa re eek eee time, were 
Pecting a ) an ess active in cor- 
ee rs. Her Ss were asked to 
Siac eee each stamped with 

rom one to four, in rows 
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and columns on a squared board to 
give the marginal totals printed on 
the board. Four problems were given, 
varying from nine squares to 36. 

Fattu, Kapos, Ervin, and Mech 
(1954) found a negative but complex 
relation between age and problem 
solving on a gear train apparatus: 

Kay (1951) too, has found that 
whereas performance deteriorates 
comparatively slightly with age on a 
simple task, the deterioration is more 
marked on a more complex one which 
involved combining the tasks al- 
ready learned. Again both time and 
errors rose, and Kay traces one cause 
of the errors to inability to discard 
wrong procedures previously learned. 
His apparatus consisted of a row of 
10 lights placed above 10 Morse keys, 
to which they could be connected in 
any serial order. His Ss, ranging in 
age from 20 to 70, had first to learn 
two five-light series, tO relearn eac 
of the series, and then to put them 
together in an alternation test. 

Summary and discussion. Anxiety 
has been found to alter Ss perform- 
ance on rigidity tests, but there is no 
clear-cut relationship holding for all 
Ss and all tests. It is not surprising 
to find individual differences here, an! 
the work by Brown (1953) suggests 
that a complex relationship may exist 
between degree of anxiety aroused, 
type of personality, and performance 
on rigidity tests. 

It is suggested that “needs” may 
influence the “set” of the S and 
hence affect his rigidity om tests 
which happen to be related to the 
“needs” in question. 

Because of the stimulating effect 
of other people it seems likely that 
most people would be more rigi 
when isolated from their fellow-men- 
But it also seems likely that individ- 
ual differences in rigidity would still 
exist under conditions of sensory Or 
mental isolation. 

As far as “age” is concerned, it 
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seems that both speed of performance 
and accuracy appear to deteriorate 
with age on problems which require 
thought. The differences become 
more marked the more complex the 
task involved. The cause of errors 
appears to be due in part to the in- 
ability to change “set” and to look 
for new ways of doing things. There 
is some evidence that training can 
help to inculcate a “set” of looking 
for improvements in methods, though 
the effect of training may not be 
similarly successful with all age 
groups. 

There does not appear to have 
been any work on the differential 
effects of age on anxiety and hence 
on rigidity. The effects of various 
training methods on groups of differ- 
ing ages is not known. Ways of 
altering “set” have not been com- 
pared and it is not know 
methods are most effective wi 
ticular age groups. In itself, age is 

writer’s 
reason for becoming interested in 
rigidity at all was due to the convic- 
tion that the topic should have some- 
thing to offer in the study of learning 
difficulties among those of middle- 
age and over, While it has become 
evident that fundamental research 
will have to be Carried out before 
tests of rigidity can be used in the 
study of age and learning, it may be 
possible to do the two things at once 
to some extent. Most of the gaps 
outlined above will have to Temain 
for some time, however, 


Discussion 


There seem to be three Possible ap- 
Proaches for the experimenter jn the 


1. At the moment, each test must 
be taken to Stand alone. Failure on 
any one test could be used as an 
operational definition of rigidity, 


For those who Wish for rapid results, 


this may be the only course. Follow- 
ing it seems, however, to have led to 
the muddle in which the topic now 
stands. Indeed, if this approach is 
followed, we may have to call rigidity 
by other names, each specific to the 
area of behavior involved. For ex- 
ample, for learning tasks, it may be 
more useful to talk about “change of 
expectancy” or “change of set.” It 
would be possible to use certain find- 
ings from the general field of rigidity 
—that intelligence and previous ex- 
perience have to be controlled, that 


(very important 
for Ss starting with feelings of in- 
feriority), and that our criterion for 
adaptability must involve a necessity 
for adaptation with Penalties for fajj- 
ure. 

2. The second Possibility would be 
the most fundamental, This would 
be to adopt a theoretical approach to 


to develop a 
ory in a satis- 
actory way, For example, it has al- 


“given demand to change” (“ 


i en be- 
tween individuals given the same 
test and not between the behavior of 


on different individ- 
there is a 
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wide variation due to other factors 
such as intelligence involved in the 
tests, this may account in part for 
the inconclusive results so far ob- 
tained. The second course may, 
therefore, not be possible until we 
have more knowledge of the composi- 
tion of the individual rigidity tests 
most often used at present, and the 
second and third methods of ap- 
proach seem to be best regarded as 
complementary. 

3. In discussing rigidity it is easy 
to confound behavior with the under- 
lying influences on behavior. We 
have the dichotomy made clear by 
Cattell when he points out that the 
causes of rigid behavior are likely to 
be threefold at least—and to lie in 
lack of intelligence, personality dif- 
ficulties, and “disposition rigidity.” 
Goldstein too, has emphasized the 
difficulty, pointing out that the rigid 
individual may react by extreme 
flexibility as well as by extreme in- 
flexibility, while the Lewinians openly 
choose to regard rigidity as a cause 


rather than as a form of behavior. 
Whether “behavior” or “cause” will 
be of interest to us depends on the 
general problems which we wish to 
solve. We may start with a piece 
of behavior—say difficulty in learn- 
ing a new job—and attempt to an- 
alyze out the various causes. Or we 
may start with a factor like ‘‘disposi- 
tion rigidity” and study the wide 
number of situations which it will 
affect. Whichever approach we 
adopt, it is essential to have tests 
available which measure rigidity as 
accurately as possible, and a battery 
of tests which may show us how 
many things other than rigidity are 
helping to make up any piece of be- 
havior in which we are interested. It 
is for this reason that it seems desir- 
able, in spite of the many incon- 
clusive results scattered through the 
literature, to advocate the factor anal- 
ysis of a chosen battery of the 
rigidity tests described in this paper; 
work along these lines is now being 
carried out by the writer. 
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DATA DISTORTIONS 
DIFFERENTIA 


DUE TO INHERENT 
L SAMPLING 


ALEXANDER M. BUCHWALD 
Indiana University 


Frequently an experimenter is in- 
terested in a phenomenon (either a 
Particular response or a relatively 
complex effect) which does not occur 
in all possible instances in his experi- 
ment. In such experiments it is usu- 
ally possible to obtain two different 
kinds of data: relative frequencies of 


acteristics include such diverse meas- 
ures as response latencies or ampli- 
tudes, increases in Sensory thresholds 


“inherent differentia] sampling” upon 
ata in the f 


Inherent differential sampling des- 
ignates the State of affai 


; ; menon from 
varying proportions of the Ss, and 


the entire sample. Put in a some- 
what different way, inherent differ- 
ential sampling occurs when: (a) 


rence of the phenomenon, (b) prob- 
ability of Occurrence is related Over 
to some quantitative 
characteristics of the phenomenon, 
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and (c) the experimental conditions 
vary widely in the sets of probabil- 
ities of Occurrence which they en- 
Sender in the Ss. 


acteristics, while others do not. 

To illustrate how inherent differ- 
ential sampling can distort the rela- 
tionship between quantitative char- 
acteristics and experimental condi- 


ple hypothetical 


ulus generaliza- 
tion. Ss are trained to press a key 
to Stimulus 1, and are then tested 
once each on Stimulj 1, 2, 3, 4, and 5 
increasingly dissimilar 
from Stimulus 1 in that order. For 


simplicity assume that all Ss respond 
to Stimulus As ih 


esp are more 
Similar to Stimulus 1; and that all Ss 


who respond to a given number of 
identical latencies, 
Table 1 shows data for this hypo- 
thetical experiment, These latencies 
ave been assigned jn accordance 


with the following assumptions: (a) 
or each S latencies in 


stimuli become less similar to 


ote that despite Ag. 
ies de- 
1 through 
Present a Sradient of 
which 


=~ 


My. 


=<, 


> 
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the relative frequencies of occurrence 
of the response. The decrease in 
mean latency with increasing dissim- 
ilarity from the training stimulus is 
not a necessary consequence of the 
chosen assumptions. Within their 
framework it would have been possi- 
ble to assign latency values so as to 
produce almost any pattern of mean 
latencies. The particular pattern 
produced was selected because it is 
similar to results reported from sev- 
eral studies of stimulus generaliza- 


is typically a small number, many Ss 
will make no responses to such a 
stimulus. But if rapid responders 
tend generally to have larger Pij’s 
than slow responders the rapid re- 
sponders will provide increasingly 
large proportions of the latency data 
as the stimuli become increasingly 
dissimilar from the training stimulus 
and hence as the entire set of Pij’s 
become increasingly smaller. Thus 
the mean response latencies to the 
various stimuli will be affected by 


TABLE 1 
Latency DATA FROM A HYPOTHETICAL EXPERIMENT ON STIMULUS GENERALIZATION 


Number of Ss 


Stimulus Number 


Responding 1 2 3 a 5 
40 300 330.1 347.7 360.2 369.9 
5 390.3 429.4 452.3 468.6 — 
10 443.1 487.6 513.7 — — 
15 480.6 557.9 — — — 
30 509.7 — — — = 
Mean Latency 408.8 408.5 387.39 372.2 369.9 
Proportion of Ss 
Responding 1.00 -70 55 45 40 


tion (Brown, Bilodeau, & Baron, 
1951; Gibson, 1939; Rosenbaum, 
1953). 

The problem posed by inherent 
differential sampling does not depend 
on each S being tested only once on 
each stimulus, nor on a perfect or- 
dering of the stimuli in ability to elic- 
it the response. In any experiment 
of this kind where each S is tested 7 
times on each of K stimuli the prob- 
lem can also arise. Let Pij designate 
the probability that the ith S will 
respond to the jth stimulus. For the 
training stimulus, or for closely sim- 
ilar stimuli, the Péj’s will tend to be 
large and most Ss will exhibit at 
least one response to such stimuli. 
But for stimuli widely different from 
the training stimulus the values of 
Pij are apt to be small and, since # 


inherent differential sampling and 
the relationship between the mean 
latencies and the various stimuli 
will be a distortion of the relationship 
that holds for individuals.’ 

Before discussing ways of remov- 
ing the effects of inherent differential 
sampling from data it seems advisa- 
ble to give a further example of an 
experiment in which inherent differ- 
ential sampling may be a problem. 
Many such experiments will be those 
in which E is interested in testing 
hypotheses of the form: (a) is a cer- 
tain effect produced more frequently 


1 In this kind of experiment E sometimes 
assigns an arbitrary value to serve as latencies 
for trials on which no response occurs. Such a 
procedure will distort the relationship which 
exists between Jatencies and stimulus dimen- 


sions. 
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by some conditions than by others, 
and (b) is the magnitude of the effect 
when it is produced greater under 
certain conditions than under others. 
An example of such an experiment is 
one where E is interested in whether 
various ‘‘stress-producing” condi- 
tions differ in the extent to which 
they disrupt performance. Having 
each S perform under control condi- 
tions and under one or more of the 
stress conditions may permit E to 
identify Ss whose performance is 
disrupted, e.g., those who make 
more errors under stress than before 
stress. The proportion of Ss show- 
ing the disruption effect under each 
condition can be used to test a hy- 
pothesis of the form of (a). To test 
a hypothesis of the form of (b) it is 
necessary to compare the mean pre- 
and poststress differences for Ss 
showing the disruption effect under 
the various conditions. But this 
measure is subject to the possibility 
of inherent differential sampling. 
Conditions which lead to disruption 
for almost all Ss may elicit small ef- 
fects from Ss who would not exhibit 
a disruption effect under other condi- 
tions, whereas those Ss who exhibit a 
disruption effect under weaker con- 
ditions may in general exhibit large 
disruption effects. 

In some experiments it may be pos- 
sible to test whether inherent differ- 
ential sampling does or does not af- 
fect the results. In the example un- 
der discussion differential inherent 
sampling would lead to unequal vari- 
ances for the magnitude of the dis- 
ruption effect under the different 
conditions. Similarly for the experi- 
ment exemplified in Table 1 differ- 
ences among the subgroups in re- 
sponse latencies to Stimulus 1 would 
indicate that inherent differential 
sampling would be a factor. Unfor- 
tunately a lack of significant differ- 
ences in the latter case would not 


rule out the possibility of inherent 
differential sampling since a relation- 
ship between the probability of re- 
sponses to generalization stimuli and 
rapidity of responding might still 
exist. 


Ways OF OVERCOMING DISTORTION 
DUE TO INHERENT DIFFERENTIAL 
SAMPLING 


Since inherent differential sam- 
pling poses problems only in the anal- 
ysis of quantitative characteristics 
the difficulty can be avoided by using 
relative frequency data. However 
hypotheses of interest cannot always 
be tested by such data. As suggested 
in the preceding paragraph there is 
a class of hypotheses which demands 
that E lay himself open to the vicissi- 
tudes of inherent differential sam- 
pling. 

Even though it may be impossible 
to avoid the presence of inherent dif- 
ferential sampling it may be possible 
to avoid contamination of the data 
from this source if the Ss can be strat- 
ified in such a way that inherent dif- 
ferential sampling does not operate 
within strata. The surest way of ac- 
complishing this is to expose each S 
to more than one, and preferably to 
all, of the experimental conditions. 
When this can be done one assem- 
bles all Ss who exhibit the phenom- 
enon under the same subset of condi- 
tions into a subgroup. Within each 
such subgroup the problem of inher- 
ent differential sampling will not 
exist since each S will contribute data 
under each condition.? Further data 
treatment should be carried out for 
each subgroup separately. If the 

* In some experiments some of the Ss may 
contribute more data than others, e.g. some 
Ss might respond on more trials under a given 
condition than others. This difficulty is easily 
Pie oe by using a mean or median value 
or each S if more than one instance of the 


phenomenon can occur for a si 
no single S 
one condition, i HAT 
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functional form of the relationship 
between experimental conditions and 
quantitative characteristics is identi- 
cal for each Sit will frequently be ob- 
tainable from the data for each sub- 
group, or at least from some of them. 
(The general question of the relation- 
ship between group and individual 
curves is beyond the scope of this 
paper but a discussion of it can be 
found in an article by Estes [1956].) 
Application of this procedure to the 
data in Table 1 would involve treat- 
ing the data on each line separately, 
and such treatment will yield the re- 
lationship used to produce the tabled 
values. 

In some experiments it may be im- 
possible to expose Ss to more than 
one condition. When this is the case 
stratification can be accomplished 
only indirectly, if at all. If there is 
known to be another variable, per- 


haps a test score, which is related to 
the probability that an individual 
will display the phenomenon, Ss can 
be stratified on this basis, and the 
data for subgroups assembled on the 
basis of similar scores on the external 
variable can be compared. The ex- 
tent to which such a procedure will 
be effective in eliminating inherent 
differential sampling will depend, of 
course, on the degree of relationship 
between the external variable used 
and probability of occurrence of the 
phenomenon. 

Finally, there may be some prob- 
lems for which no amount of experi- 
mental ingenuity will be sufficient to 
design an adequate experiment in 
which Ss can be exposed to more 
than one condition, and where no 
highly correlated external variables 
can be found. For these problems the 
author knows of no existing solution. 
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Factor analysis was developed pri- 
marily by psychologists. Apart from 
the very early debates (Pearson & 
Moul, 1927; Spearman & Holzinger: 
1924, 1925, 1929; Wishart, 1928) 
about the standard errors of tetrad- 
differences, development was along 
mathematical rather than along sta- 
tistical lines. This is apparent from 
Thurstone’s writings (Thurstone: 
1935, 1947), for example, where lib- 
eral use is made of matrix algebra 
and where (1947, p. 282 et seq.) the 
rank of the side matrix, rather than 

any sampling error considerations, is 
the guiding principle in determining 
the “dimensionality” of the common 
factor space. 

The first attempt by a statistician 
to deal with a problem resembling 
that of the factor analyst came in 

1933 with Hotelling’s paper (1933) on 

the “Analysis of a Complex of Statis © 

tical Variables into Principal Com- 
ponents,” though, as Burt often 
points out, Karl Pearson had dealt 
with a closely related problem as 
early as 1901 in a paper entitled “On 
Lines and Planes of Closest Fit to a 
System of Points in Space.” But a 
principal component analysis, being 
only an empirical method of “break- 
ing down” a covariance or correla- 
tion matrix into orthogonal com- 
ponents, equal in number to the num- 
ber of variables, must not be con- 
fused—as Bartlett (1953), and as 
Kendall and Lawley (1956) remind 
us—with a factor analysis of the 
same matrix. In a factor analysis, by 
contrast, there is a basic assumption 
that the scores obtained by Ss on the 
variables concerned can be explained 
in terms of a few “common factors,” 
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plus a specific factor for each vari- 
able, and this implies that some hy- 
pothesis about the number of “fac- 
tors” necessary and sufficient to ac- 
count for the covariance between the 
variables is being tested when a factor 
analysis of a covariance or correlation 
matrix is performed. Consequently, 
it was not until Lawley, who had 
become familiar with the factor prob- 
lem proper during his association with 
Godfrey Thomson, produced his clas- 
sic paper in 1940, on “the estimation 
of factor loadings by the method of 
maximum likelihood,” that a rigorous 
statistical solution of the factor prob- 
lem was available and factor analysis 
became a respectable statistical tech- 
nique. Lawley’s Paper was not ac- 
cepted without criticism (Kendall & 
Lawley, 1956; Young, 1941) at first, 
but it served the immediate purpose 
of stating the factor problem in lan- 
guage which mathematical statis- 
ticians could understand, and though 
in the intervening years since 1940 
only a few steps forward have been 
achieved, many statisticians have 
turned their attention to factor 
analysis (Anderson & Rubin, 1955; 
Danford, 1953; Kendall, 1954; Law- 
ley, 1940; Rao, 1955; Rippe, 1953; 
Wold, 1953) attracted as much by the 
complexity of the problems involved 
as by the intrinsic interest of the sub- 
ject itself. 

Previous to Lawley’s 1940 paper, 
and during the subsequent decade— 


> o 
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variety of factorial techniques ap- 
peared in psychometric journals, not 
only for extracting factors and for 
testing the significance of residual 
matrices, but also for rotating factors 
into “psychologically meaningful” po- 
sitions and for comparing factors de- 
rived from different studies. These 
techniques, logically derived or em- 
pirically justified, became the stock 
in trade of practicing factor analysts. 
While they served, on the whole, a 
fairly useful purpose and are likely 
to continue in use in pilot studies 
with small samples, it is probable that 
future historians will be severely 
critical of them and of their users; 
critical of the techniques because of 
their approximate nature and lack of 
precision and statistical efficiency, 
and of their users for extravagant 
claims on their behalf. Indeed, even 
amongst psychologists, who them- 
selves occasionally use factorial tech- 
niques, criticism to date has not been 
lacking (Burt, 1952; Holzinger: 1940, 
1942; McNemar, 1941; Maxwell, 
1956;Reyburn, 1943; Saunders, 1948). 
In view of these salutary rebukes it is 
surprising that such efficient factorial 
techniques as do exist are not more 
frequently employed. This appears 
to be due in part to the intrinsic com- 
plexity of the basic factor equations 
themselves, for which no efficient yet 
simple solution has yet been found: 
and in part to the onerous nature of 
the calculations involved in the rigor- 
ous procedures at present available 
(Howe, 1955; Lawley: 1940, 1953, 
1958; Rippe, 1953). However, with 
the advent of electronic computers 
the latter excuse is no longer valid. 
Consequently this paper aims at 
bringing to the psychologist’s atten- 
tion some (large sample) statistical 
methods which should be of value to 
him. Some of these are only tangen- 
tially associated with factor analysis, 
others are directly associated with it. 


SIGNIFICANCE OF A CORRELATION 
Matrix 


Occasionally an investigator (Ey- 
senck, 1952, pp. 12-14) wishes to 
test a correlation matrix for signif- 
icance, that is, he wishes to assess the 
statistical significance of the entire 
correlation structure of his variables. 
This would be especially desirable if 
the intercorrelations proved to be 
very low and it had to be decided 
whether or not a factor analysis of 
the matrix was justified (Bannatyne, 
1953). The appropriate test for the 
significance of a correlation matrix 
was given in 1950 by M. S. Bartlett 
with acknowledgement of earlier 
work by Wilks (1932). It is a chi- 
square test, namely, 


v= —(n—(2p+5)/O)In| R| [1] 


with 3p(p—1) degrees of freedom, 
where N is the sample size and = 
(N-1), pis the number of variables, 
| R| is the determinant of the correla- 
tion matrix R, and “In” stands for 
loge. The laborious part of this calcu- 
lation is the determination of |R|, 
for which an electronic computer 
would be very desirable if p were 
greater than ten or so. However, if 
the approximation 


|R| = Do r, i<j, 
(i, j,=1, EE) 


due to Lawley (1940), is employed, 
where 7;; refers to the correlation Co- 
efficients, the labor involved in using 
Formula [1] is greatly reduced. It 
would appear too (Bartlett, 1950, p. 
84), that this approximation 1s re- 
markably accurate even for moder- 
ately large samples, and the calcula- 
tions could be performed on an or- 
dinary desk calculator. Lawley 
writes the formula in the form, 


=n > riz, i<j, [2] 
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for it would appear that the multi- 
plier — (n — (2p +5)/6) is a bit dubi- 
ous. 


COMPARING Two VARIANCE- 
COVARIANCE MATRICES 


The need for a valid test of the dif- 
ferences between two or more vari- 
ance-covariance matrices is obvious 
to the factor analyst at least, for if 
two such matrices containing the 
same variables (and them alone) dif- 
fer, then their factorial composition 
must also differ. It is desirable too to 
do such a test on the covariance 
rather than on the correlation mat- 
rices, for a comparison of the latter 
would be invalidated if the sample 
variances, which are employed when 
converting the covariances into cor- 
relations, themselyes differ signif- 
icantly, 

The test may be deduced from 
Wilk’s paper (1932). Let the two 
samples concerned have degrees of 
freedom m and n respectively, i.e, 
the sample sizes are (m-+1), and 
(m2+1). Let Ai and Ae be the re- 
spective covariance matrices with 
variates each. Let the “pooled’’ co- 
variance matrix be A, with (m+n) 
degrees of freedom: i.e. A=(m4,+ 
m2A»)/(m-+n2), or the elements of 
Matrix A are weighted means of the 
elements of Matrices A; and Ay, 
Then the quantity 


(m-++m) In | A| —m In | al 


—min |As [3] 
is distributed, on the null hypothesis, 
approximately as chi square with 
3P(P+1) degrees of freedom, where 
$ refers to tests: | A|, | A,| and |4] 
stand for the values of the deter- 
minants of the matrices A, A; and 
Az respectively. Considerations by 
Box (1949) show that this test can 
be made more sensitive in the case of 
moderately large samples by apply- 
ing the multiplier, 


S ees) 
6+1) 


( 1 h 1 1 ) 
GS i — 
n Ne. m+m 
to the expression for chi square given 
above. ’ 
It is easy to see how Expression 
[3] would have to be adjusted if 
k rather than just two covariance 
matrices were being compared: Box 


(1949) actually considers this more 
general case. 


THE SIGNIFICANCE OF RESIDUAL 
MATRICES 
Closely associated with the prob- 
lem of finding efficient estimates of 
factor loadings, which the maximum 
likelihood method provides, is the 
problem of testing the hypothesis 
concerning the number of factors 
necessary to “explain” the covariation 
between the variables. The first 
satisfactory answer to this problem 
was also given by Lawley in his 1940 
Paper in the form of a chi-square test 
of the significance of the residual 
matrix after k factors had been fitted. 
The exact formula is 
x’ =n |C|/]a] [4] 
with 4 (b—k)?—3(p+p) degrees of 
freedom, where 4 is the observed 
correlation matrix and C is the fitted 
correlation matrix given by C=LL'+ 
VE being the matrix of factor load- 
ings, L’ its transpose, and V the 
diagonal matrix of specific variance 
(inclusive of error variance). The 
evaluation of the determinants of the 
Matrices C and A, namely Kal and 
A|, each of order $, without elec- 
tronic computing aid, has in the past 
made the use of Formula [4] imprac- 
ticable. However, there isan approxi- 
mation to it, namely, 


xi=n- D (ri?/s52), i<j, 
(751) EE a) [5] 


Lo 
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where 7; is the residual in the 7th row 
and jth column of the residual ma- 
trix, s: is the specific variance (in- 
cluding error) of the ith test, and 
N=n-+1 is the sample size, which 
can be evaluated readily on a desk 
computer. Formula [2] is the limiting 
case of Formula [5] when & is taken 
as zero, that is when no factors are 
postulated and the correlation struc- 
ture is assumed not to differ signif- 
icantly from zero. 

In his 1940 paper Lawley em- 
phasized that the chi-square test 
given by Formulae [4] and [5] should 
not be employed for testing the signif- 
icance of a matrix of residuals except 
in the case where efficient estimates 
of the factor loadings had been ob- 
tained. This ruled out its use, for a 
time, where the centroid method of 
analysis was concerned; latterly this 
restriction has been removed. In a 
paper entitled “A Statistical Exami- 
nation of the Centroid Method,” Law- 
ley (1955) examines the statistical 
efficiency of the latter method of 
analysis (in the special case where 
the communalities are taken as 
given), and obtains a test of signif- 
icance of the residuals. Not only 
was the statistical efficiency of the 
centroid method for the special case 
considered found to be very high, 
but also the significance test proved 
to be exactly that given earlier for 
use with the maximum likelihood 
method. This is an encouraging dis- 
covery, and it is safe to conclude that 
when care is taken with the centroid 
method to ensure that maximum 
variance is extracted with each suc- 
ceeding factor—a result which is not 
always achieved by following Thur- 
stone’s rules of thumb for sign reflec- 
tion, as Thurstone himself knew very 
well—and when a few iterations have 
been performed to ensure that the 
loadings are well on the way to con- 
vergence, then Formula [5] above 


provides a valid and efficient test of 
the significance of the residual ma- 
trix. 

Lawley’s approximate chi-square 
test would then appear preferable for 
use with the centroid method even to 
the general test of “the complete- 
ness of factor solutions” given by 
Rippe (1953). Rippe’s test has not 
been widely used to date primarily 
because it involves the laborious task 
of inverting a large covariance ma- 
trix, But even if we assume that with 
electronic computers this is no longer 
an obstacle, there is also the fact, 
which Rippe points out, that the tests 
given above are more powerful than 
his. 

Previous to Rippe’s and Lawley’s 
work, psychologists, using the cen- 
troid method, had to be content with 
empirical tests for deciding the ques- 
tion of “how many factors?” Of the 
numerous empirical tests in current 
use two may be mentioned—one due 
to McNemar (1942), and one due to 
Burt (1952). McNemar’s test is too 
widely known to need description 
here, Burt’s is a chi-square test, 
namely, 


v=(N-3) D-H (6) 


with 3(p—k)?—3(P+) degrees of 
freedom, where N is the sample size, 
p the number of variables and k the 
number of factors extracted; z refers 
to Fisher’s transformation of the cor- 
relation coefficient and to a similar 
transformation of the amount of the 
correlation accounted for by the fac- 
tors extracted. 

At first sight, Burt’s test might ap- 
pear to have considerable validity. 
Justifying it he states (1952, pp- 
126-127) that since “the distribution 
of sx/(N—3) is nearly normal, with 
unit deviation, it follows that on 
summing the squares for all the dis- 
crepancies and multiplying the result 
by (V—3), we obtain a quantity 
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which will þe distributed approxi- 
mately as chi-square.” But this is a 
good example of how deceptive em- 
pirical and approximate methods can 
be. For one thing, it is likely that z 
and Z would be correlated, in which 
case a quadratic form of these ex- 
pressions would be required in For- 
mula [6]; then again nothing is shown 
about the distribution of the “corre- 
lations” as reconstructed from the 
factor loadings, so that we do not 
know whether the z-transformation is 


ley’s test, and a B when it agrees 
With Burt’s test; in other Cases qa 
sufficient number of factors had not 
been extracted to decide the issue. 


TABLE 1 


VALUES oF CHI SQUARE OBTAINED 
BY LAWLEY’ṣS EXACT TEST, AND 
Burt's APPROXIMATE TEST 


Lawley’s Burt’s 
Sample df Test Test 


x? 2 Teste 
1 24 120.4 94.5 
1 3 ; 


M cNemar’s 


A fOr E ore E 
Bu o 2a a L 
eile Feo as = 
5S 26 43.4 932 L 
S 26 50,9) 53% L 
7 24 238.3 B30 Pi 
8 16 168.9 -42g 
“a. 13 28.9 199 B 
1U 29.0 «fod L 
a 910 ae B 


* L indicates that McNemar’s Test agrees 
with Lawley’s and B that it agrees With 
Burt’s. 


This table shows that the results 
given by Burt's approximate test 
are of a different order from those 
given by Lawley’s exact test, and 
in each case grossly underestimate 
the value of chi Square. McNemar’s 
test shows up in a slightly better 
light. 


VARIANCES AND COVARIANCES OF 
ESTIMATED LOADINGsS 


A simple method of estimating the 
Standard errors of factor loadings 
would be of inestimable value to 
factor analysts, but unfortunately 
none has yet been found. The first 
serious attempt to meet the need 
appeared in 1949 in a Paper entitled 
“problems in factor analysis,” again 
by Lawley (1949), he expressions 
involved were very manageable and 
were used by Emmett (1949) in a 
study reported in the same year. Un- 
fortunately, it was later revealed that 


factory, for “the estimated loadings 
Obtained for a given set of tests in 
are slightly correlated 
With estimated loadings for the same 
Set of tests in another factor.” 

In 1953 Lawley returned to the 
Problem of the estimation of the vari- 
ances and covariances of factor load- 
ings. in his Uppsala Paper on “A 
Modified Method of Estimation in 
Factor Analysis 
Sample Results,” Where new expres- 


are assumed 
to be known. Ag such knowledge is in 


8eneral denied us we are still in the 
Position of having no Valid method 
for estimating the Standard errors of 


A 
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factor loadings. This is a serious gap 
in the statistical theory of factor 
analysis. Lacking knowledge of the 
magnitude of the standard errors of 
factor loadings we are not in a posi- 
tion to decide in any particular case 
if a specific loading is significantly 
different from zero. This means that 
claims by factor analysts that they 
have, for example, “rotated their 
factors to simple structure,” carry 
with them little conviction, for ‘‘sim- 
ple structure” implies that certain 
loadings are zero. If it proves to be 
the case, as McNemar (1941) and 
others have suggested, that the 
standard errors of loadings, especially 
with small samples, tend to be large, 
that is, that the precision of most fac- 
tor studies is low, then the “confi- 
dence limits” of any given “simple 
structure” are likely to be wide, and 
as a result that simple structure may 
readily be achieved. Under condi- 
tions of low precision, however, such 
an achievement is of little worth. 
Consequently, and in view of the fact 
that the “simple structure” concept 
itself does not seem capable of rigo- 
rous mathematical or statistical defi- 
nition (Uppsala Symposium on Psy- 
chological Factor Analysis, 1953, p. 
77 et seq.), it is doubtful whether it is 
worthy of retention as a precise con- 
cept in a valid and efficient statistical 
theory of factor analysis. This recom- 
mendation, which in some quarters is 
likely to be considered irreverent if 
not downright heretical, is made on 
the strength of a noteworthy advance 
in the statistical theory of factor 
analysis, by Anderson and Rubin 


(1955) and by Howe (1955). 


Factor IDENTIFICATION AND 
FACTORIAL INVARIANCE 


In the contributions just referred 
to, the problems involved in the esti- 
mation of factor loadings, under vari- 
ous different assumptions, are dis- 


cussed, as also is the problem of de- 
ciding on the number of factors. 
Howe, too, devotes valuable space to 
the practical issues involved in the 
solution of the fundamental factor 
equations; in particular to the itera- 
tive processes most likely to be of ad- 
vantage. But our reference to the 
work of Anderson and Rubin and of 
Howe is a specific one. 

It is well known that though the 
question of the number of factors 
necessary to explain the correlations 
in a matrix can now be answered, this 
does not enable the factor loadings to 
be defined uniquely. The factor axes 
themselves may be rotated into an in- 
numerable number of different posi- 
tions within the factor space, any one 
of which is as good as any other as far 
as the reproduction of the correlation 
matrix is concerned. The position 
finally chosen in any study is that 
which the analyst thinks is most 
meaningful from a psychological 
viewpoint. The arbitrariness of such 
a procedure has already been noted. 
The important additional contribu- 
tion which the writers just mentioned 
have made is this: if previous to an 
analysis the experimenter can say 
that, for the factors postulated, cer- 
tain factor loadings will be zero, that 
is if he can specify a particular “‘sim- 
ple structure,” and if these zeros are 
sufficiently numerous and conven- 
iently placed, these conditions are 
sufficient to determine the factor so- 
lution completely and validly. Howe 
gives a worked example. 

Unaware of the researches of Howe 
and of Anderson and Rubin, Lawley 
(1958) was busy on the same problem 
and has produced a solution very 
similar to that given by Howe. 

The practical difaculty for the psy- 
chologist of postulating in advance 
not only the number of factors which 
he expects from an analysis, but also 
of specifying the positions of the zero 
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loadings, cannot be underestimated; 
but, as Howe points out, in cases 
where he is in doubt it would always 
be possible, by dividing his sample 
randomly in half, to Carry out two 
parallel studies. He could then ex- 
periment with rotations of the factors 
derived from one of the studies and, 
using the information thus derived, 
set up hypotheses regarding the num- 
ber and positions of the zero loadings, 
which he could then cross-validate on 
the data from the parallel study. A 
The justification for recommending 
procedures such as that outlined by 
Howe, which involve long and rather 
tortuous calculations lies—as sug- 
gested at the beginning of this pa- 
per—in the computational facilities 
which electronic machines now offer. 
However, these procedures are com- 
mendable primarily on the grounds 
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that they have a valid statistical ba- 
sis. The time surely has come for fac- 
tor analysts to assemble and use the 
few precision instruments available 
to them, at least in research work pre- 
pared for publication, though in pilot 
studies the less precise instruments 
are likely to continue to be of value. 


SuMMARY 


Some exact large sample statistical 
techniques for factor analysts are re- 
viewed. A recent investigation of the 
efficiency of the centroid method of 
analysis is noted, and a recommen- 
dation is made regarding a valid test 
of the significance of residual mat- 
rices when this method is employed. 
The “simple structure” concept is crit- 
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THE “EIDETIC IMAGE” AND “HALLUCINATORY” BEHAVIOR: 
A SUGGESTION FOR FURTHER RESEARCH 


THEODORE XENOPHON BARBER? 
Department of Social Relations, Harvard University 


After an extensive series of inves- 
tigations, Jaensch (1930) concluded 
that the “‘eidetic image” is neither 
afterimage nor “memory image” but 
shares some of the characteristics of 
both. Although many other investi- 
gators (e.g., Busse, 1920; Meenes & 
Morton, 1936; Peck & Hodges, 1937) 
accepted this conclusion, Allport 
(1928) raised a dissenting voice, con- 
cluding from his own experiments 
and from a review of the literature 
that the “‘eidetic image” is a vivid 
“memory image.” To complicate 
matters further, Morsh and Abbott 
(1945) recently reported, after an in- 
vestigation utilizing 700 Ss, that the 
“eidetic image” is a type of after- 
image. Investigators have not only 
disagreed on the interpretation of this 
Phenomenon, they have also disa- 
greed on the primary data. Some in- 
vestigators report that practically al} 
individuals in a “large” population— 
e.g., Fischer and Hirschberg’s (1924) 
140 Ss—possess “eidetic imagery,” 
while other investigators report that 
none of their Ss, in an equally large 
population, possess this type of 
“imagery.” Nevertheless, we can ac- 
cept all of these contradictory re- 
ports: investigators have subsumed a 
number of phenomena under the 
general rubric of the “eidetic im- 
age”! In some cases it is unlikely 
that the term has referred to any 
type of imaginal behavior at all; in 
other cases, the “‘eidetic image” has 
clearly referred to a negative or posi- 


1 Post-doctoral research fellow, National 
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version of this manuscript, 
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tive afterimage; in still other in- 
stances, the term has referred to 
what can best be conceptualized as a 
type of “hallucinatory” behavior. 
Investigations of this phenomenon 
usually proceeded as follows: (a) The 
experimenters first had the Ss experi- 
ence physiological afterimages. Ac- 
cording to E. R. Jaensch (1930), this 
was done in order “to demonstrate to 
the subjects exactly what it means to 
see something, although no object is 
actually present” (p. 4), (b) The Ss 


` were then asked to look at a picture 


for a specified period of time—e.g., 
15 seconds—and then to look at a 
nearby gray “projection” screen and 
to report what they “saw” there. 
(c) If the Ss Stated that they “saw” 
the original picture on the ‘projec- 
tion” screen and gave other behav- 
ioral indications that they actually 
“saw” something there, they were 
classified as Eidetiker, i.e., as indi- 
viduals possessing “‘eidetic imagery.” 
This procedure immediately raises a 
serious question. Did the Ss— 
who were almost always elemen- 
tary school children—state that they 
“saw” something on the screen in or- 
der to please the experimenter? Did 
some children report that they “saw” 
the picture there because they were 
perfectly aware, from the nature of 
the instructions, that the experimen- 
ter expected them to say that they 
“saw” the picture on the “projec- 
tion” screen? More than one investi- 
gator (Klüver, 1928; Morsh, 1945; 
Schwab, 1924) has emphasized the 
ambiguous nature of the word “see,” 
especially as it is used by children. 
Given the ambiguous nature of this 
term, Morsh and Abbott (1945) cor- 
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rectly point out that when “Jaensch 
recommends telling the child subject 
that something ‘must be seen’. . . it 
is expected that most children will 
‘see’ something.” In fact, in their 
own investigation, Morsh and Abbott 
(1945) found that when the children 
stated that they “saw” many minute 
details of the picture, they later ad- 
mitted that they “saw” them in their 
mind and not on the “projection” 
screen. 

If some children actually did ‘‘see”’ 
something on the screen, how do we 
know that what they ‘‘saw’’ was not 
an afterimage? Since, in many in- 
stances, the children were instructed 
only to “look” at the original picture, 
there was nothing to prevent them 
from fixating a point and subsequent- 
ly experiencing an afterimage. All- 
port (1928), Koffka (1923), and 
other investigators (Schroff, 1926; 
Scola, 1925) report that this often oc- 
curred. Allport (1928), for example, 
writes that what some investigators 
have called an ‘‘eidetic image” is 
“without doubt often an after-image 
(since it ‘can be produced through fix- 
ation’).” E. R. Jaensch (1930) has 
also noted that in some cases ‘‘eidetic 
images” are difficult to differentiate 
from afterimages; e.g., ‘‘In those 
cases in which the imagination has 
little influence, they are merely modi- 
fied after-images.”’ In fact, the 
“eidetic images” of one large group 
of Eidetiker—the so-called “T-type” 
(W. Jaensch, 1926)—are difficult, if 
not impossible, to distinguish from 
physiological afterimages; both the 
afterimage and this type of ‘‘eidetic 
image” are indistinct, cannot be al- 
tered in form and color at will, usu- 
ally follow Emmert’s law, and usually 
show the complementary color. 

In some instances, however, Ss 
state that they “see,” and behave as 
if they “see,” something on the “pro- 
jection” screen when the situation 
precludes their “seeing” an after- 


image. Investigators report that it is 
not always necessary for the “‘eidet- 
ic” S to first look at a picture 
(Kliiver, 1926). Not only do “spon- 
taneous [eidetic] images occur” (W. 
Jaensch, 1921), but some individuals 
are “able after hours, days, and even 
months and years to reproduce an 
eidetic image with all its previous 
vividness” (Kroh, 1922) and, at 
times, the “eidetic image? even 
“takes an obsessive character and 
recurs without volition” (Allport, 
1924). Purdy (1936) reports that his 
“eidetic” S is able, at practically any 
time, to “see” a vivid, three-dimen- 
sional ‘‘eidetic image” of any person 
or object; in fact, the S's “images” 
are often so vivid that they suppress 
the actual surroundings. In addi- 
tion, this S can “see” an “eidetic 
image” of a man devoid of a head, 
can “see” green leaves upon barren 
winter trees, can “see” a smooth- 
shaven man with a full beard, can 
“see” objects as altered in size and 
position, etc. ‘There is no essential 
difference between this S’s “eidetic 
images” and what has been termed 
since the last century as “the waking 
hallucinations of healthy persons” 
(Parish, 1897) and as the “nega- 
tive and positive hallucinations” of 
“good” hypnotic Ss (Barber, 1958b; 
Barber, in press). As McDougall 
(1929) and Smythies (1956) have 
pointed out, if an S states that he 
“sees,” and behaves as if he “sees,” 
a person or an object which has 
many if not all of the characteristics 
of an actual object, when that “ob- 
ject” is by no means present to other 
observers, the S is carrying out “hal- 
lucinatory” behavior in the strict 
sense of the term, even though he re- 
mains perfectly aware that the ‘‘ob- 
ject” is his own creation. To say that 
Purdy’s S “knows” that what she 
“sees” does not actually exist, while 
the “good” hypnotic S “believes 

that his “hallucinations” are existen- 
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tial realities, may be perfectly cor- 
rect (Barber, 1958a); however, this is 
a secondary characteristic of the be- 
havior which we deduce after the be- 
havior has occurred. During the 
event itself—i.e., when Purdy’s S is 
“seeing” the “‘eidetic image” and 
when the “good” hypnotic S is ‘‘see- 
ing? the ‘‘hallucination’’—the be- 
havior in both cases is essentially the 
same. 

Similarly, when W. Jaensch (1926) 
discusses the ‘‘eidetic images” which 
can be induced by mescaline, there is 
no way that we can differentiate 
these so-called ‘‘eidetic images” from 
what other investigators (Beringer, 
1927; Marshall, 1937; Stockings, 
1940) have termed the “visual hal- 
lucinations” which can be induced 
(in some Ss) by this drug. Also, when 
Jaensch reports that the “‘eidetic im- 
age” of a color is often followed by 
its appropriate negative afterimage, 
he is reporting the same behavior 
which has been conceptualized, since 
1888, as the ‘‘color hallucinations” of 
hypnotic Ss (Binet & Féré, 1888) 
and, more recently, as the “halluci- 
nated colors” of “normal” Ss (Bar- 
ber, in press), In fact, it is difficult, 
if not impossible, to differentiate the 
second large group of Eidetiker— 
the so-called “B-type” (W. Jaensch, 
1921)—from individuals who can 
“hallucinate” at will; in both cases, 
the S is not only able to call up an 
“image” of an object or person and 
to banish it whenever he desires but 
he is also able to alter its form, color 
duration, and location at will. i 

Even if we agree that behavior 
should not be termed “hallucinatory” 
unless the S ‘‘believes’ that the 
“hallucinatory” object is a real ob- 
ject, we can still insist that some 
cases of ‘“‘eidetic imagery” cannot be 
differentiated from a type of “hallu- 
cinatory”’ behavior; as E. R. Jaensch 


(1930) writes, “In exceptionally 
strong cases... eidetic images and 
real objects can under certain condi- 
tions be confused with one another” 
(p. 18). Fischer and Welke (1926) 
also note this difficulty and empha- 
size that “hallucinations” should 
be classified into three categories: 
(a) non-psychogenic hallucinations, 
(6) psychogenic hallucinations, and 
(c) “eidetic images” with “reality- 
character.” 

If the term “eidetic image” has 
subsumed more than one type of be- 
havior, the contradictory data and 
conclusions of investigations in this 
area become understandable. Fur- 
thermore, if the term has referred, in 
some instances, to a type of “hallu- 
cinatory”’ behavior, further investi- 
gations of this phenomenon could 
provide a solid basis for a general 
theory of “hallucinations.” How- 
ever, instead of asking Ss to look at a 
picture (and then to report what they 

see” on a “projection” screen)— 
thus allowing the Ss to experience an 
afterimage from the original picture 
—we should directly ask them first to 
“imagine” an object and then to at- 
tempt to “project” the imagined ob- 
ject at some point in the room. From 
Galton’s investigation (1883) during 
the last century, from the experi- 
ments on “eidetic imagery” and 
“hallucinatory” behavior (Martin, 
1915) during this century, and from 
the writer’s recent experiments (Bar- 
ber, 1957; Barber, in press) with 
“good” hypnotic Ss—who in many 
cases were able to carry out this type 
of behavior long before they were ever 
“hypnotized’’—we can expect to find 
a relatively small Proportion of 
adults and a larger Proportion of chil- 
dren who “hallucinate” at will—i.e., 
who will, during the experiment, 
(a) report that they have “projected” 
the imagined object, (b) insist—when 
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critically examined—that it does not 
differ at all, or differs only in some re- 
spects, from an actual object, and 
(c) behave, overtly and physiolog- 


ically (e.g., alpha blocking on the 
electroencephalogram, alteration in 
pupil size) as if they are actually 
“seeing” an existential reality. 
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A NOTE ON “CORRECTING PERSONALITY SCALES FOR 
RESPONSE SETS OR SUPPRESSION EFFECTS” 


HAROLD WEBSTER 
University of California, Berkeley 


I am grateful to Samuel Messick, 
of Educational Testing Service, for 
pointing out an error in “Correcting 
Personality Scales for Response Sets 
or Suppression Effects,” which ap- 
peared in this journal for January, 


1958. The second part of Formula 
[4] should read 


Sp à t d f Se 

_—;. fri; Instead o = S 

F Tt Sr Tt 
Messick also notes that Formula 
[7] can be improved by substituting 
rrces) for the regression coefficient b. 
This improved version of [7] may 
be derived using several approaches, 
but the most general is probably to 
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consider T7’= T—$t a special case of 
Mosier’s (1943) formula for the reli- 
ability of a weighted composite. It 
seems necessary in any of the meth- 
ods, including the application of 
Mosier’s formula, to regard b as a 
constant, thereby ignoring its sam- 
pling variance; in any case the sam- 
pling fluctuations of b are likely to be 
small in comparison with the other 
variations which are allowed for in 
the improved formula. 
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A REVIEW OF THE RELATIONSHIPS BETWEEN PERSONALITY 
AND PERFORMANCE IN SMALL GROUPS" 


RICHARD D. MANN 
University of Michigan 


A wide range of practical and theo- 
retical interests have found expres- 
sion in the study of small groups. As 
the major bibliographic sources 
(Hare, Borgatta, & Bales, 1955; Mc- 
Grath, 1957; Strodtbeck & Hare, 
1957) amply attest, small group re- 
search has proceeded along numerous 
independent lines. One interest, how- 
ever, has been dominant for more 
than 50 years. While phrased in 
various ways, the relationship be- 
tween the personality characteristics 
of the individual and his performance 
in the group has remained a central 
concern. 

There have been at least three con- 
ceptual approaches to this problem. 
One approach considers the individ- 
ual as having various needs and as 
being motivated to satisfy some of 
these needs through interaction with 
others; the point of interest is the re- 
lation between the individual's per- 
sonality and his goal-directed be- 
havior in groups. In another view, 
the individual is conceived of as a 
stimulus, or set of stimuli, for the 
other members of the group, and the 


1 The author wishes to express his gratitude 
to Roger W. Heyns for his valuable sugges- 
tions and criticism throughout the prepara- 
tion of this manuscript. The survey of the 
literature was carried out during the period 
when the author held a Research Training 
Fellowship granted by the Social Science Re- 
search Council. 


relation between the individual’s per- 
sonality and the way in which he is 
perceived and judged by his peers as- 
sumes primary importance. In the 
third approach, the group is con- 
ceptualized as a system confronted 
with various problems, external-and ` 
internal, and attention shifts to the 
processes whereby particular individ- 
uals volunteer or are selected to oc- 
cupy various positions and perform 
various roles necessary for the solu- 
tion of the problems. Although these 
three approaches have generate 
many nonoverlapping research ques- 
tions, they have produced a body of 
data which may be considered mean- 
ingfully as a whole. 

This review attempts to summarize 
the present state of knowledge about 
the relationship of an individual’s 
personality to his behavior or status 
in groups. Although the independent 
effects of varying the nature of the 
sample and history or size of the 
group upon the performance of indi- 
viduals are not considered, an effort is 
made to determine the effect of such 
situational factors on the relation- 
ships observed between personality 
and performance. eke 

While the purpose of this review 1S 
to provide an adequate and accurate, 
description of the present state vol 
knowledge in the field, its intent is to 
stimulate research rather than to 
make a final summary. It is thought 
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that an organized presentation of the 
findings to date may help to clarify 
relationships which have been over- 
looked or misunderstood. Moreover, 
it is hoped that this summary may be 
used as a target and a taking-off 
point for future research, thus en- 
couraging publication and helping to 
make knowledge in this field more 
cumulative. 


Selection of the Studies 


The studies selected for detailed 
examination meet the following five 
criteria: (a) the sample was drawn 
from a population of high school age 
or older; (b) the groups studied were 
face-to-face groups; (c) some assess- 
ment was made of the individual’s 
personality; (d) some assessment was 
made of the individual’s behavior 
or status in the group; and (e) the 
results were either in correlational 
form or made use of a control group, 
i.e., studies testing only leaders or 
only social isolates are not considered. 
The only exception to (b) occurs in 
those studies of conformity in which 
the individual believes he is- inter- 
acting with other individuals, where- 
as, in fact, the experimeter has con- 
trolled the interaction through tape 
recordings or false statements about 
the actual behavior of the others in 
the group. 

This review covers the available 
literature from 1900 through Octo- 
ber, 1957. The bibliography was col- 
lected by searching the most relevant 
journals and published abstracts, by 
following the network of references 
from article to article, and by ob- 
taining as much unpublished re- 
search as possible. In addition, the 
earlier reviews (Bass, 1954; Bor- 
gatta, 1954; Gibb, 1950, 1954; Jen- 
kins, 1947; Roseborough, 1953; Smith 
& Krueger, 1933; Stogdill, 1948) 
which emphasize leadership or popu- 


larity to the exclusion of other as- 
pects of performance covered here 
have been useful. No claim is made 
to completeness, but no sources of 
known relevance have been delib- 
erately overlooked. 


The Personality Variables | 


The studies which meet the criteria 
for selection used over 500 different 
measures of personality. However, 
less than a quarter of these measures 
appear in more than one study. Asa 
commentary on the level of integra- 
tion within the field, this fact needs 
little amplification. There is a notice- 
able failure throughout these studies 
to resolve methodological issues in a 
consistent fashion. 

Clearly, it is not feasible to pres- 
ent each separate personality vari- 
able and its correlates. Some organ- 
ization of the measures was called for. 
But what organization? The field of 
personality assessment is test rich 
and integration poor. The 500 meas- 
ures all have labels, to be sure, but 
they are as divergent as oral sadism, 
the F scale, spatial ability, adven- 
turous cyclothymia, hypochondria- 
sis, and total number of vista re- 
sponses. Yet all of these measures 
have been used to predict something 
about an individual’s performance in 
groups. In addition, there are in- 
numerable adjectives used for ratings 
and self-descriptions. The situation 
required a set of personality factors 
small enough to remain manageable 
and pure enough to be meaningful, 
and then empirical grounds on which 
to classify as many of the variables 
as possible into the selected set of 
factors. 

To arrive at a useful set of person- 
ality dimensions, the empirical work 
in the field of personality assessment, 
Particularly the work of French 
(1953), Cattell (1946, 1956, 1957) and 
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Eysenck (1953) was examined. With 
one exception, the seven dimensions 
or factors chosen are those frequently 
isolated in the study of personality 
by factor analytic techniques, al- 
though two emerge only as second- 
order factors in some reports. A brief 
description of each personality factor 
is presented here. 

Intelligence. This factor includes 
all the diverse and specific mental 
abilities. Sixty-nine of the 500 differ- 
ent variables included in the research 
reviewed are measures of this factor; 
of these sixty-nine measures 45 are 
derived from questionnaire and ob- 
jective tests, 24 from adjective rat- 
ings. The four most frequently used 
measures of intelligence are: school 
or college grades, American Council 
of Education (ACE) Psychological 
Exam, Cattell’s Sixteen Personality 
Factor Questionnaire (16 P.F.) Fac- 
tor B, and total number of responses 
on the Rorschach. 

Adjustment. The positive end of 
this dimension has been called ad- 
justment, ego strength, and normal- 
ity, while the negative end has been* 
called maladjustment, emotionality, 
neuroticism, psychoticism, and anx- 
iety. Seventy-one objective test and 
questionnaire variables and 60 adjec- 
tives are considered as measures of 
this factor. The most frequently used 
measures of this factor are derived 
from standard personality inven- 
tories: Minnesota Multiphasic Per- 
sonality Inventory (MMPI), Guil- 
ford-Zimmerman, Bernreuter, and 16 
PE. 

Extroversion-introversion. Eysenck 
(1953) presents the fullest discussion 
of this dimension, although the need 
to integrate as many variables as 
possible led to the use of a broader 
definition. Extroversion-introversion 
as used in this review, more closely 
resembles one of Cattell’s (Cattell, 


Saunders & Stice, 1951) second-order 
factors from the 16 P.F., which pulls 
together the dimensions of sociabil- 
ity, surgency, and cyclothymia vs. 
schizothymia. Frequently-used meas- 
ures of this factor are: the Bernreuter 
F-2 scale (self-sufficiency), MMPI 
Hypomania scale, ratings on ‘‘so- 
ciable,” and the relevant scales from 
the 16 P.F. (Cattell, 1956) and Guil- 
ford-Zimmerman. (French, 1953). A 
total of 38 questionnaire and objec- 
tive test variables and 61 adjective 
ratings were used in the studies re- 
viewed. 

Dominance. The positive end of 
the dimension is described by domi- 
nance or ascendance, the negative end 
by submissiveness or helplessness. 
Seventeen objective test and ques- 
tionnaire variables and twelve ad- 
jective-ratings which have been 
found to measure dominance were 
employed in these studies. 

» Masculinity-femininity. This factor 
measures the extent to which an in- 
dividual’s interests or preferences re- 
semble those common to his own or 
the opposite sex. Of the 14 question- 
naire and objective test variables and 
six ratings, the ones most frequently 
used in these studies are the mascu- 
linity-femininity scales from the 
MMPI, Guilford-Zimmerman, and 
Goodenough Speed of Association 
Test. 

Conservatism. The positive end of 
this dimension is defined by conserv- 
atism, conventionalism, Or author- 
itarianism, the negative end by 
radicalism. In the studies review, the 
measures of this factor include 36 
questionnaire and objective test varı- 
ables and 11 adjective-ratings. By 
far the most frequently used are the 
F scale and factor Qı from the 16 
P.F. 3 A 
Interpersonal sensitivity. This fac- 
tor has not been found in factor ana- 
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lytic studies of personality, and some 
authors have questioned whether it is 
proper to speak of empathy and in- 
sight as characteristics of the individ- 
ual. However, it is included in this 
review because it has been related to 
an individual’s status in groups a 
sufficient number of times to merit 
separate treatment. For the most 
part, the measures describe an in- 
dividual’s ability to guess (a) his own 
status in a group, (b) the status 
hierarchy of the entire group, as de- 
termined by the pooled estimates of 
the members, or (c) the opinions and 
attitudes of the other group members. 
One hundred and fifty variables 
out of the total of over 500 could not 
reasonably be classified into any one 
of the seven factors. Some of these 
150 variables fall into other known 
factors or clusters, but the number of 
additional results which could be in- 
cluded by considering them is too 
small to justify the consequent com- 
plexity of the presentation. The 
majority of the excluded variables, 
however, come from projective tests; 
in such cases, both the titles and the 
known correlations with other per- 
sonality measures combined to mys- 
tify this reviewer as to what meaning 
they might have outside the language 
system of the particular technique. 
Many projective test variables do 
not fall into stable and identifiable 
clusters or factors; further, the level 
of description used in projective tests 
makes it difficult to bridge the gap 
between the seven aspects of person- 
ality examined in this review and the 
various projective measures. Except 
for the measures of interpersonal 
sensitivity, the distribution of vari- 
ables into factors was determined by 
the empirical evidence for the meas- 
ure’s validity. Where no validity 
data were found for a measure, a cal- 
culated risk was taken in assigning it 


to a factor if the title and operation 
closely resembled the set of variables 
already chosen on empirical grounds 
as measures of the factor; this proc- 
ess accounted for no more than 50 
of the variables classified. 


The Status and Behavior Variables 


In contrast to personality vari- 
ables, measures of an individual’s 
status and behavior in groups fall 
easily into a small number of classes. 
On the basis of both operations and 
labels the following six dependent 
variables were selected: (a) leader- 
ship, (b) popularity, (c) total activity 
rate, (d) task activity, (e) social- 
emotional activity, and (f) conform- 
ity. Leadership and popularity are 
considered to be status variables; the 
remaining four are considered to be 
behavior variables. 

Leadership has been measured in 
four ways: by having an observer 
rate the individual’s attained leader- 
ship, by having an individual's peers 
rate him, by using an individual’s 
formal selection for office as the cri- 

“terion of leadership, or by having the 
individual rate himself. The only 
measures included in the discussion 
of leadership which do not bear that 
label are a few measures on the in- 
dividual’s productivity and effec- 
tiveness. Popularity has been meas- 
ured by having an individual’s peers 
rate him on such dimensions as the 
extent to which they like him, find 
him acceptable as a friend, would 
choose him for leisure time activities, 
or perceive him to be popular. 

The remaining dependent variables 
are based upon actual observations of 
the individuals behavior in the 
group. Activity rate has been meas- 
ured in terms of either the number of 
acts initiated or the number of sec- 
onds spent talking. The distinction 
between task activity and social- 
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emotional activity is made by Bales 
(1950). He distinguishes between 
task acts, relevant to the external- 
adaptive problems of the group 
(suggestions, opinions, orientations, 
and task questions), and social-emo- 
tional acts, relevant to the internal- 
integrative problems of the group 
(agreeing and disagreeing, showing 
tension release and tension, showing 
solidarity and antagonism). Meas- 
ures of behavior other than those em- 
ploying the Bales categories were 
matched as carefully as possible to 
the Bales categories and classified on 
that basis. Conforming behavior in- 
cludes all measures of an individual’s 
tendency to yield to the opinions or 
pressures of the group. 

The review thus covers seven as- 
pects of personality and six aspects 
of behavior and status. If the data 
were available in sufficient quantity, 
we would be able to examine 42 dif- 
ferent relationships between person- 
ality and behavior or status. 


Method of Presentation 


One final issue, the most appro- 
priate unit of research, must be dis- 
cussed before the presentation of the 
findings. The problem arises from 
the fact that a single study may con- 
tain, for example, more than one 
measure of leadership and more than 
one measure of intelligence. On the 
one hand, we might consider the 
study as the unit, examining only the 
over-all trend of the many results. 
On the other hand, we might con- 
sider each result as the unit, examin- 
ing the findings from a study in as 
much detail as possible. 

The advantage of using a whole 
study as the unit is that units are 
then independent, and, therefore, 
statistical tests of the significance of 
the trends are possible. Another ap- 
proach is to consider as the separate 


unit each result, that is, each correla- 
tion or measure of difference between 
groups. This can lead to overrepre- 
sentation of a particular sample and 
a particular set of measures in the 
total summary of research to date. 
Moreover, it is not possible to use 
statistical tests to evaluate trends 
based on more than one result per 
study, since using the same subjects 
and then using independent or de- 
pendent variables which are highly 
correlated with each other would 
violate the assumption of independ- 
ence which underlies statistical tests. 

If each relationship had been in- 
vestigated in a sufficient number of 
studies to permit statistical tests in 
most cases, we would have chosen 
studies as the units. Because such is 
not the case, we have chosen the re- 
sult as the unit of research, but it is 
recognized that, for the above-men- 
tioned reasons, any trends based on 
separate results must remain as de- 
scriptive indications of the findings 
to date. Where the number of studies 
is sufficient to provide an opportu- 
nity to use statistical tests, the tests 
will be made. 

There are a number of advantages, 
however, to using the results as units. 
Over 1400 results are examined in 
this review. The far greater number 
of results may compensate for the dis- 
advantages of this approach by of- 
fering greater stability to the trends. 

The association between a person- 
ality variable and a status or be- 
havior variable is reported in one of 
eight forms: (a) positive and signifi- 
cant, (b) positive and not signifi- 
cant, (c) positive but no report of 
significance, (d) negative and signifi- 
ficant, (e) negative and not signifi- 
cant, (f) negative but no report of 
significance, (g) zero correlation, and 
(h) not significant but no report of 
direction. Throughout this review @ 
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Positive finding refers to the associa- 
tion between the Positive ends of the 
Personality and performance vari- 
ables as described earlier, not to the 
confirmation of an hypothesis, The 
-05 level is accepted as the criterion 
of significance, 

Each relationship for which five or 
more results are available and which 
has been investigated in more than 
one study is examined in detail. 
Three Summary statistics are used 
throughout findings 
for each relationship, First, the over- 
all direction of the results is shown by 
the Percentage of results which are 
positive; this is calculated by divid- 


significance of the results underlying 
the trends, the Percentage of the total 


by the total number of re. 
the number which are Positive but 
untested (c); if the trend 
this is calculated by dividing the 
number of significantly 


negative but untested (f). 

There appears to be a general be- 
lief that many inconclusive and nega- 
tive findings are filed away into ob- 
scurity, doomed never to enter the 
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professional literature. To the extent 
that this bias exists in the area re- 
viewed, the trends are misleading. 

his reviewer has succeeded in ob- 
taining some unpublished data and 
doctoral dissertations in an 


. data included here are in almost per- 


fect agreement with the data in the 
journals and monographs, This 
Seems to suggest that considerations 


results operate to determine which 
results will be published, 


Lead ership 


Viewed historically, the study of 
leadership has stimulated more than 
i Controversy. The trait 
approach to leadership, the view that 
leadership is an i 


ave spoken of an individual as 
Possessing a measurable quantity of 
ip was perhaps an unfortu- 
nate choice of words. The clear im- 
plication of such a statement is that 
since leadership is specific to the in- 
dividual, it will remain l 


the individual regardless of the situa- 
tion in which h 
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phenomenon, created through the in- 
teraction of individuals (leaders and 
followers), and that the selection and 
stability of any leadership pattern is 
a function of the task, composition, 
and culture of the group. From all 
this work has emerged some such 
summary formulation as that an in- 
dividual’s leadership status in groups 
is a joint function of his personality 
and the particular group setting. 
There is an interesting parallel here 
to the controversy over the role of 
heredity and environment in deter- 
mining behavior; the initial criti- 
cisms and intensity gave way to con- 
cessions that each factor sets limits 
for the operation of the other, and 
researchers turned to studying the 
relative importance of and the inter- 
action between the two major fac- 
tors. 

Table 1 presents a summary of the 
relationships between seven aspects 
of personality and leadership. Shown 
there are the number of relevant stu- 
dies, the number of results contained 
in those studies, the distribution of 
results into the various forms in 
which they are reported, and the 
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sults are reported in eight forms: the 
positive and negative associations 
may be significant (Sig.), not signifi- 
cant (N.S.), or untested (Unt.); the 
remaining two forms, zero correlation 
(zero) and not significant but no di- 
rection reported (?N.S.) are com- 
bined in the table. The base num- 
bers for the summary percentages 
are enclosed in parentheses below the 
percentages. The base number for 
the percentage of results which are 
positive (i) is the total number of re- 
sults which indicate direction; the 
base number for the percentage of 
significant results which are positive 
(j) is the total number of significant 
results; the base number for the per- 
centage of results which are both 
significant and in the direction of the 
over-all trend (k) is the total number 
of results minus the positive but un- 
tested (c) or negative but untested 
(f) results, depending on the direction 
of the trend. A separate section 
covers each relationship between an 
aspect of personality and leadership. 

Intelligence. Twenty-eight of the 
studies reviewed (Arbous & Maree, 
1951; Bass, 1951b; Bass & Coates, 


three summary statistics. The re- 1952; Bass, McGehee, Hawkins, 
TABLE 1 
Tur RELATIONSHIP BETWEEN PERSONALITY FACTORS AND LEADERSHIP 

No. No. Positive Negative Positive % Sig. & 

Personality _of of Zero in Dir. 

Factors Stud- Re- Sig. N.S. Unt. Sig. N.S. Unt. ?N.S. 6 % of of Trend 
ies sults (a) (8) (0) (d) (e) (A h G) Sis G) (k) 
Intelligence 28 19% 9% 68 14 1 22 0 0 88 99 50 
(196) (92) (182) 
Adjustment. 22 164 50 55 14 2 28 0 15 80 96 33 
(149) (52) (150) 
Extroversion 22 119 37 38 6 6 23 3 72 85 33 
4 (113) (43) (113) 
Domi. . 9 6 4 73. 478 42 
minance 12 39 15 3 0 2 n eD (36) 
Masculini 71 92 16 
culinity 9 70) ait 37 0 1 19 0 2 Gh a2 (68) 
Conservati 38. 15 29 
atism 17 62 S 18 0 17 21 3 0 (2) 20) (59) 
Sensitivit; 2 74 94 15 
y 15 AM T 55 3 1 25 o Ge) (8) 
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Young, & Gebel, 1953; Bass & Wurs- 
ter, 1953a; Bass, Wurster, Doll, & 
Clair, 1953; Borgatta, 1953; Carter & 
Nixon, 1949; Cattell & Stice, 1954; 
Cobb, 1952; Cowley, 1931; Dunkerly, 
1940; Flemming, 1935; French, 1951; 
Gibb, 1949a; Gordon, 1952; Gowan, 
1955; Green, 1950; Howell, 1942; 
Hunter & Jordan, 1939; McCuen, 
1929; Richardson & Hanawalt, 1943; 
Riggs, 1953; Slater, 1955a; Stolper, 
1953; Sward, 1933; Wurster & Bass, 
1953; Zeleny, 1939) have investigated 
the association between an individ- 
ual’s intelligence and his leadership 
status in one or more groups. These 
studies contain 196 results, 173 (or 
88%) of which indicate a positive 
relationship between intelligence and 
leadership. Furthermore, 91 (or 99%) 
of the 92 significant results are in the 
positive direction. Omitting those re- 
sults which are positive but untested 
for significance, exactly half of the re- 
maining 182 results are both positive 
and significant at the .05 level. Con- 
sidering independent studies as the 
units of research, the positive asso- 
ciation between intelligence and lead- 
ership is found to be highly significant 
(p <.01) by the sign test. However, 
the magnitude of the relationship is 
less impressive; no correlation re- 
ported exceeds .50, and the median 
r is roughly .25. 

There is some indication that 
verbal intelligence is a better pre- 
dictor of leadership than such non- 
verbal factors as memory and nu- 
merical ability. Grades are not 
strongly related to leadership in col- 
lege social groups, although this fact 
may reflect competition between 
scholastic and social activities for the 
student’s time and energy. 

There would seem to be little 
doubt that higher intelligence is as- 
sociated; with the attainment of lead- 
ership in small groups. That the null 


hypothesis may be emphatically re- 
jected should not obscure the fact 
that the magnitude of the relation- 
ship is not high. 

Adjustment. The 22 studies (Bass, 
McGehee et al., 1953; Bass, Wurster 
et al., 1953; Borgatta, 1953; Carter & 
Nixon, 1949; Cattell & Stice, 1954; 
Cowley, 1931; Dexter & Stein, 1955; 
Dunkerly, 1940; Flemming, 1935; 
French, 1951; Gibb, 1949a; Gordon, 
1952; Gowan, 1955; Holtzman, 1952; 
Hunter & Jordan, 1939; Richardson 
& Hanawalt, 1943, 1944, 1952; 
Slater, 1955a; Stolper, 1953; Wil- 
liamson & Hoyt, 1952; Zeleny, 1939) 
relating the personal adjustment of 
the individual to his leadership status 
yield 164 results. The trend of the 
results is clearly positive, as indi- 
cated by the fact that 80% of the 
results are in the positive direction. 
If only the 52 significant results are 
considered, the proportion of positive 
results rises to 96%. One third of the 
results are both positive and signifi- 
cant. The over-all trend within 
every study but one is positive, and 
the sign test indicates that the null 
hypothesis of no association may be 
rejected at the .01 level. No single 
variable measuring adjustment is cor- 
related with leadership over .53, and 
the median correlation appears to lie 
close to .15. 

Four studies using the Bernreuter 
(Gowan, 1955; Richardson & Han- 
awalt, 1943, 1944, 1952) and one 
using the 16 P.F. (Cattell & Stice, 
1954) present the most striking evi- 
dence of this positive association but 
the various techniques for measuring 
adjustment (questionnaires, objec- 
tive tests, and ratings) are about 
equally productive of positive results. 
While no single measure of adjust- 
ment can be expected to be an effi- 
cient predictor of leadership, there is 
strong evidence to indicate a positive 
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relationship between an individual's 
adjustment and the leadership status 
he is likely to attain. 
Extroversion-introversion. Twenty- 
two studies (Bass, McGehee et al., 
1953; Bass, Wurster et al., 1953; 
Borgatta, 1953; Carter & Nixon, 
1949; Cattell & Stice, 1954; Cowley, 
1931; Dexter & Stein, 1955; Dun- 
kerly, 1940; Flemming, 1935; French, 
1951; Gordon, 1952; Gowan, 1955; 
Hunter & Jordan, 1939; Moore, 1935; 
Richardson & Hanawalt, 1943, 1952; 
Slater, 1955a; Stolper, 1953; Sward, 
1933; Williamson & Hoyt, 1952; 
Zeleny, 1939) have investigated the 
association between extroversion and 
leadership; 72% of the results are 
positive, and 85% of the 43 signifi- 
cant results are positive. The non- 
chance character of this association 
is suggested by the fact that 33% of 
the results are both significant and 
positive. Finally, the sign test on 
the over-all trends for the independ- 
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nance, as measured by personality 
scales, is associated with an individ- 
ual’s leadership status; 73% of the 
results are positive, and 71% of the 
21 significant results are positive. 
The significance of the positive as- 
sociation is suggested by the fact 
that 42% of the results are both posi- 
tive and significant. No correlation 
reported exceeds .42, and the median 
correlation is roughly .20. 

The two measures of dominance 
which yield the best evidence for a 
positive relationship between domi- 
nance and leadership are the Ascend- 
ence and Dominance scales from the 
Guilford-Zimmerman and 16 P.F., 
respectively. Particularly unsuccess- 
ful, however, have been the attempts 
to use Allport’s Ascendence-Submis- 
sion Test. Although the trend is not 
very strong, these data suggest that 
dominant or ascendent individuals 
have a greater chance of being desig- 


nated leader. 


ent studies reveals that the positive“ Masculinity-femininity. Thereisa 


association is significant at the .01 
level. 

No single measure of extroversion 
is consistently related to leadership, 
with the possible exception of the 
relevant Guilford-Zimmerman scales. 
The median correlation is roughly 
.15, and the highest correlation re- 
ported is .42. Those individuals who 
tend to be selected as leaders are 
more sociable and outgoing, al- 
though the process of inferring such a 
characterization from the titles of 
the personality scales is a tenuous 
matter at best. 

Dominance. Twelve studies (Bass, 
McGehee et al., 1953; Bass, Wurster 
et al., 1953; Borgatta, 1953; Carter & 
Nixon, 1949; Cattell & Stice, 1954; 
Cobb, 1952; Cowley, 1931; Dexter & 
Stein, 1955; Gordon, 1952; Moore, 
1935; Stolper, 1953; Zeleny, 1939) 
have investigated whether domi- 


slight positive association between 
masculinity and leadership status; 
71% of the results are positive. Al- 
though 92% of the 12 significant re- 
sults are positive, significant results 
are found in only two of the nine 
studies. No single measure of mas- 
culinity relates to leadership in a con- 
sistently positive direction, and the 
correlations are uniformly low (Bass, 
Wurster et al., 1953; Bell, 1952; 
Carter & Nixon, 1949; Cobb, 1952; 
Dexter & Stein, 1955; Gordon, 1952; 
Slater, 1955a; Stolper, 1953; Zeleny, 
1939), 

Conservatism. Only one measure of 
this factor displays any consistency 
in its association with leadership. The 
California F scale, a measure of 
authoritarian trends within the per- 
sonality, has been used 10 times in 
the prediction of leadership. In each 
case, high-F, or authoritarian, indi- 


ee RICHARD D. MANN 


i were found to be rated lower 
eae than nonauthoritarian 
individuals. In general, there is a 
negative association between con- 
servatism and leadership. This is 
especially evident within the signifi- 
cant results, 17 out of 20 being in the 
negative direction (Bass & Coates, 
1952; Bass, McGehee et al., 1953; 
Bass, Wurster et al., 1953; Carter & 
Nixon, 1949; Cattell & Stice, 1954; 
Cowley, 1931; Flemming, 1935; 
French, 1951; Hays, 1953; Haythorn, 
Couch, Haefner, Langham, & Carter, 

1956a; Hollander, 1954; Hunter 

& Jordan, 1939; Martin, Gross, & 

Darley, 1952; Masling, Greer, & Gil- 

more, 1955; Slater, 1955a, 1955b; 

Stolper, 1953). 

Interpersonal Sensitivity. Few areas 
covered by this review contain so 
much research which builds upon 
prior results as this one. Unfortu- 
nately, few are so plagued by diffi- 
culties and contradictory evidence. 

The over-all trend of the results js 

positive; in 74% of the cases leaders 

are found to be more accurate in esti- 
mating various aspects of the opin- 
ions of other group members than 
nonleaders. -More impressive is the 
fact that 15 out of the 16 significant 
results indicate greater insight among 
leaders. Although two of the relevant 
studies report a zero correlation be- 
tween interpersonal sensitivity and 
leadership, the trends of the results 
in the remaining 13 studies are posi- 
tive, It would appear that while most 
researchers have been unable to ob- 
tain positive results which are sta- 
tistically significant, they have ob- 
tained positive results with im res- 
sive consistency (Bell, 1952; Bell & 

Hall, 1954; Campbell, 1953; Chow- 
dhry, 1948; Chowdhry & Newcomb, 
1952; Gage & Exline, 1953; Green, 
1948; Greer, Galanter, & Nordlie, 
1954; Hites & Campbell, 1950; Nord- 


lie, 1954; Smith, Jaffe, & Livingston, 
1955; Sprunger, 1949; Stolper, 1953; 
Trapp, 1955; Zeleny, 1939). 
According to Campbell (1955) one 
part of these results is open to a seri- 
ous methodological criticism. When 
interpersonal sensitivity is measured 
in terms of an individual’s accuracy 
in guessing how his peers will rate 
him on leadership, the correlation be- 
tween interpersonal sensitivity and 
leadership is spuriously positive. If 
accuracy is measured by the dis- 
crepancy between an individual’s ac- 
tual leadership status and his guessed 
leadership status, and if, further, 
there is a tendency for most indi- 
viduals to guess that they will be 
rated as having fairly high status, 
then the higher the actual status, the 
less the discrepancy and the higher 
the apparent interpersonal sensitiv- 
ity. Thus, the positive correlation be- 
tween actual leadership status and 
this accuracy score is a statistical 
artifact. The cogency of Campbell’s 
criticism may be reflected in the fact 
that 14 of the 17 correlations reported 
between actual leadership status and 
accuracy about one’s own leadership 
Status are positive. Since the propor- 
tion of these questionable results 
which are positive (82%) is higher 
than the Proportion of results remai n- 
ing (70%) when these are eliminated, 
the validity of Campbell’s criticism 
is at least suggested, 
here are a number of problems of 
interpretation in this area of re- 
Search. Gage and Cronbach (1955) 
have written a penetrating analysis 
of the difficulties in measuring inter- 
personal sensitivity. Among other 


things, they point out the importance 
of controlling the contribution of the 
individual’s actual similarity to others 
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matter than to conclude that his 
opinion is more similar to the average 
opinion. More rigorous examination 
of the components of interpersonal 
sensitivity and their various associa- 
tions with leadership remains a task 
for future research. 

A second problem arises when one 
attempts to specify which of the 
many items of group opinion leaders 
may be expected to estimate more 
accurately than nonleaders. Accord- 
ing to Chowdhry and Newcomb 
(Chowdhry, 1948; Chowdhry & New- 
comb, 1952) the item cannot be too 
irrelevant to the group under study 
or the leader will not have adequate 
data on which to base his estimate. 
On the other hand, according to New- 
comb (1954) the item cannot be too 
relevant or everyone will know the 
opinion of everyone else, and the dif- 
ference will disappear. Chowdhry 
and Newcomb are proposing a range 
of relevance within which accuracy 
in estimating group opinion will be 
positively related to leadership 
status. In the absence of an objective 
definition of relevance, this proposi- 
tion, for all its attractiveness on the 
common sense level, has remained an 
ad hoc instrument to be wielded 
against conflicting results. An ex- 
amination of five studies (Campbell, 
1953; Gage & Exline, 1953; Greer et 
al., 1954; Hites & Campbell, 1950; 
Trapp, 1955) subsequent to Chow- 
dhry and Newcomb’s reveals a low 
positive relationship between leader- 
ship and accuracy, but fluctuations in 
the magnitude of the association can- 
not be related to the relevance of the 
items because no valid scale of rele- 
vance can be applied across studies. 

One additional fact emerges from 
the research in this area. Group 
members believe that their leaders 
are more aware of their opinions and 
feelings than the nonleaders of the 


group (Campbell, 1953; Sprunger, 
1949: Zeleny, 1939). In summary, 
there appears to be a low but clearly 
positive relationship between inter- 
personal sensitivity and leadership. 
However, methodological and con- 
ceptual problems remain which can 
be resolved only by future research. 

Techniques of measurement. Lead- 
ership status has been measured in at 
least four ways: by observer ratings, 
by peer ratings, by criterion meas- 
ures, and by self-ratings. The latter 
technique has been used only once in 
these studies, but for the three re- 
maining techniques it is possible to 
ask whether different results are ob- 
tained when different techniques are 
used. 

Peer ratings and criterion measures 
rest upon the estimates of an indi- 
vidual’s peers. Peer ratings are es- 
sentially descriptions of an indi- - 
vidual’s present leadership status, 
whereas criterion measures reflect the 
group’s selection for future leader- 
ship. The peer ratings are assess- 
ments of, the informal leadership 
structure, whereas criterion measures 
reflect the formal leadership struc- 
ture. Numerous studies have noted 
that there is seldom complete cor- 
respondence between the designa- 
tions which emerge from these two 
approaches. Observer ratings meas- 
ure the present informal leadership 
structure of the group, but the evalu- 
ation is made by someone outside the 
group, in most cases by someone not 
personally involved in the future of 
the group, and, therefore, the ob- 
server is not implicitly locating him- 
self on the status hierarchy by the 
act of rating. Finally, the observer 
is a member of a unique species of 
judging humanity, a social scientist, 
with special training and perhaps 
even special criteria of leadership. 

In the cases of intelligence, ad- 
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justment, and extroversion vs. lead- 
ership, the number of results is large 
enough to permit detailed analysis of 
the relationships in terms of the dif- 
ferent measuring techniques em- 
ployed. Table 2 shows the percentage 
of results which are positive when the 
three techniques of measuring leader- 
ship are related to the personality 
factors of intelligence, adjustment, 
and extroversion. The base numbers 
are shown in parentheses. 

The relationship between intelli- 
gence and leadership appears to be 
quite independent of the techniques 
of measuring leadership. On the 
other hand, there is a striking differ- 
ence between the way adjustment 
and extroversion are related to lead- 
ership as the techni 
ment varies, 
tively related to peer ratings on lead- 


ership in 97% of the cases, while it is 
positively rela iteri 


criterion measures. 
an individual’s adju 
Important in deterr 


Percentage of Results 
Positive 


Peer Criterion Observer 
Ratings Measures Ratings 


Intelligence 91 85 89 
(66) (40) (69) 

Adjustment 97 76 76 
(31) (87) (41) 

Extroversion 50 86 


70 
(30) (58) (35) 


mal leadership status (peer ratings) 
than his formal leadership status (cri- 
terion measures). In contrast, extro- 
verted individuals are no more likely 
than introverted individuals to be 
rated as informal leaders by their 
Peers, but they are quite likely to be 
selected as the formal leader for the 
future. Finally, scanning the third 
column of the table, it may be noted 
that intelligence is more consistently 
related to observer ratings than 
either adjustment or extroversion. 
It does appear that different as- 
pects of an individual’s status are be- 
ing measured by these techniques, 
and that these different aspects are 
not uniformly related to his person- 
ality. This crude division of the op- 
erations into peer ratings, criterion 
measures, and observer ratings sug- 
gests at least two dimensions of pos- 
sible relevance, It may be important 
to differentiate between descriptions 
of present or informal leadership and 
choices for future or formal leader- 
ship; to rate on leadership and to 
leadership may engage 
quite different standards on the part 
of the group member, Secondly, it 


In addition, dominance, masculinity, 
interpersonal Sensitivity are 
found to be Positively related to 
leadership, while Conservatism is 

2 be negatively related to 
leadership, inally, evidence has 
een presented to indicate that the 
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TABLE 3 
Tue RELATIONSHIP BETWEEN PERSONALITY FACTORS AND POPULARITY 

No. No. Positive Negativ Positi % Sig. & 
Personality _ of of Ze Zo aiee ian 
Factors Stud- Re- Sig. N.S. Unt. Sig. N.S. Unt. (gH) % % of of Trend 
R E O OO S NO) ms i) Sig G) ($) 

Intelligence 13 38 6 23 o 1 6 0 2 81 86 17 
(36) (7) (36) 

Adjustment 18 78 15 34 5 o 19 0 5 74 100 21 
173) EAS) (73) 

Extroversion 13 46 9 13 9 1 5 o 9 84 90 24 
(37) 00) (37) 

Dominance 6 9 0 5 0 2 1 0 1 63 0 0 
(8) (2) (9) 

Masculinity 4 8 o 5 o 0 3 0 o 63 — 0 
(8) (0) (8) 

Conservatism 11 18 3 7 1 2 2 0 3 73 60 18 
(15) (5) (17) 

Sensitivity 38 6 16 0 1 13 0 2 61 86 16 
(36) (7) (38) 


relationship between personality fac- 
tors and leadership varies with the 
technique of measuring leadership. 


Popularity 

The personality determinants of 
individual popularity have received 
less attention than the determinants 
of leadership. At the same time, how- 
ever, the importance of personality 
factors has been more or less as- 
sumed, and the situational approach 
to popularity is not well developed. 
While less is known about the actual 
consistency with which an individual 
maintains his popularity in different 
groups and across changing condi- 
tions, there is reason to believe that 
popularity, no less than leadership, 
may be profitably examined in terms 
of both personality and situational 
factors. 

Table 3 presents a summary of the 
relationships between seven aspects 
of personality and an individual's 
popularity in groups. Since this 
table is constructed in a manner 
parallel to Table 1, no detailed ex- 
planation of its form will be given. 

Intelligence. Thirteen studies 
(Bass, Wurster et al., 1953; Bonney, 


Hoblit, & Dreyer, 1953; Borgatta, 
1953; Burks, 1937; Cronbach, 1950; 
Fiedler, Doyle, Jones, & Hutchins, 
1957; French & Mensh, 1948; Kelly, 
1957; Mill, 1953; Reilly, 1947; Riggs, 
1953; Shapiro, 1953; Slater, 1955a) 
have related an individual’s intelli- 
gence to his popularity. An exam- 
ination of the 38 results shows that 
81% are positive and 86% of the 
seven significant results are positive; 
17% of the results are both positive 
and significant. The maximum cor- 
relation obtained is .37, and the 
median correlation is no higher than 
.10. 

College grades are more strongly 
related to popularity than any other 
measure of intelligence. In contrast, 
it may be remembered that grades 
were less strongly related to leader- 
ship than other measures of intelli- 
gence. In general, there appears to 
be a tendency for intelligent indi- 
viduals to be more popular. | 

‘Adjustment. All of the 15 signif 
cant results relating an individual's 
personal adj ustment to his popularity 
are in the positive direction, but when 
the insignificant and untested results 
are included, the proportion falls to 
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4%. Although several of the cor- 

e reported are over-.50, the 
median is close to .10 (Bass, Wurster 
et al., 1953; Bonney et al., 1953; 
Borgatta, 1953; Burks, 1938; Cat- 
tell, 1934; Cohen, 1954; Cronbach, 
1950; Fiedler et al., 1957; French & 
Mensh, 1948; Guthrie, 1956; Kelly, 
1957; Martin et al., 1952; Mill, 1952, 
1953; Shapiro, 1953; Slater, 1955a; 
Tagiuri, 1952). 

No single measure of adjustment is 
convincingly related to popularity, 
with one exception (Guthrie, 1956), 
an opinion survey designed to meas- 
ure “satisfactory personal habits.” 
There is some indication in these data 
that more popular persons are better 
adjusted. 

Extroversion. When the separate 
results within each study are pooled 
and the trend over independent 
studies is assessed, it is found that 11 
of the 12 trends are in the positive 
direction. Further indication that ex- 
troversion is positively associated 
with popularity comes from the 46 
Separate results; 84% of the results 


and 90% of the significant results are 
Positive, 


The scales me. 
on the 16 P.F, are 


The higt 
individual emerg 
as a sociable, surgent, and em 
ally labile person (Bass, 


Mensh, 1948; Kelly, 1957; Lemann & 
Solomon, 1952; Mill, 1952, 1953; 
Shapiro, 1953; Slater, 1955a). 

On the basis of nine 
contradictory results little can be 
said about the relationship between 


dominance and popularity. The 
trend is positive, but the two signifi- 
cant results are negative (Bass, 
Wurster et al., 1953; Bonney et al., 
1953; Borgatta, 1953; Kelly, 1957; 
Lemann & Solomon, 1952; Shapiro, 
1953). 

Masculinity-femininity, None of 
the attempts to relate masculinity to 
Popularity have yielded significant 
results, and the trend, though posi- 
tive, is weak (Bass & Wurster, 1953a; 
Mill, 1953; Shapiro, 1953; Slater, 
1955a). 

Conservatism. Conservatism is 
Positively associated with popularity 
in 73% of the results. More popular 
individuals tend to be more conserva- 
tive, conventional, or authoritarian 
(Bass, Wurster et al., 1953; Bonney 
et al., 1953; French & Mensh, 1948; 
Hays, 1953; Kelly, 1957; Martin et 
al., 1952; Masling et al., 1955; 
Rohde, 1951; Shapiro, 1953; Slater, 
1955a). 

; I nter personal sensitivity. The rela- 
tionship between empathy or inter- 
personal sensitivity a 
has been investigated in 11 studies 
(Ausubel, 1955; Ausubel & Schiff, 
1955; Gage & Exline, 1953; Greer et 
al., 1954; Lemann 


S are positive, this 


must be examined 
more carefully, 


If Campbell’s (1955) criticism is 


part on the pe 


p rson’s actual 
status, this Produces sp 


uriously posi- 
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tive correlations. Eliminating the re- 
sults to which Campbell's criticism 
would be directed, the proportion of 
positive results among those remain- 
ing is only slightly over 50%. Until 
the possibility can be discounted that 
a number of spurious results are em- 
bedded in this body of data, the 
direction of this relationship cannot 
be estimated with safety. 

Summary. Extroversion, intelli- 
gence, adjustment, and conserva- 
tism are found to be positively re- 
lated to popularity. The research to 
date, for various reasons, provides no 
definite answer to the question of how 
dominance, masculinity, and inter- 
personal sensitivity are related to 
popularity. 


Total Activity Rate 


Only three aspects of personality 
have been related to total activity 
rate a sufficient number of times to 
warrant their inclusion in this review. 
Table 4 presents a summary of the 
relationships of intelligence, adjust- 
ment, and extroversion to the indi- 
vidual’s total activity. 

Intelligence. The relationship be- 
tween intelligence and activity rate 
could hardly be clearer. All 36 re- 
sults are positive, and one-third of 
the results are significant. The 
median correlation is between .15 
and .20, the highest correlation re- 
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ported is .34. The data leave little 
doubt that the relationship between 
intelligence and total amount of par- 
ticipation, although of low magni- 
tude, is positive (Bass, 1951b; Bass, 
Wurster et al., 1953; Borgatta, 1953; 
Brown, 1950; Slater, 1955a; Zeleny, 
1939). 

Adjustment. Roughly three-quar- 
ters of the total number of results in- 
dicate a positive relationship be- 
tween adjustment and total activity 
rate, but few of the results reach sig- 
nificance. With particular consist- 
ency, adjustment as measured by the 
MMPI is positively related to the 
total amount of an individual’s par- 
ticipation (Bass, Wurster et al., 1953; 
Borgatta, 1953; Brown, 1950; Cer- 
vin, 1956, 1957; Slater, 1955a). 

Extroversion. Measures of extro- 
version are positively related to total 
activity rate in 11 (or 79%) of the 14 
results; all four significant results are 
positive. Two studies (Brown, 1950; 
Slater, 1955a) report a positive cor- 
relation between the Hypomania 
scale of the MMPI which indicates 
greater maladjustment among high 
participants. Since other research 
(French, 1953) has shown that the 
Hypomania scale measures both ex- 
troversion and maladjustment, these 
results at least suggest that extrover- 
sion may be more strongly related to 
total activity rate than adjustment 
(Bass, Wurster et al., 1953; Bor- 


TABLE 4 
THE RELATIONSHIP BETWEEN PERSONALITY FACTORS AND Torat Activity RATE 
No. No. Positive Negative Positive % Sig. & 
Personality of “of — — oy = apres 
Factors Stud- Re- Sig. N.S. Unt. Sig. N.S. Unt. (g,h) KA Fao! o (k) 
ti Re Se oa a a a ne wa eis tNO) SI) 

7 100 100 st) 
Intelligence 6 36 12 24 0 0 0 0 o 36) (12) (36) 
A OA aE aaao a. S40) OCs) A) 
Extroversion 5 14 4 7 0 0 3 o, 9 da 4) aa) 
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gatta, 1953; Brown, 1950; Slater, 
1955a; Zeleny, 1939). . 

Summary. Of the three personality 
measures used with any frequency, 
intelligence stands out as the per- 
sonality characteristic most conclu- 
sively related to activity rate. Extro- 
version and adjustment also seem to 
bear a positive relationship to ac- 
tivity rate. 


Task Activity 


Task activity includes measures of 
the frequency with which an indi- 
vidual gives suggestions, opinions, 
and orientations and asks questions. 
It is necessary, therefore, to differen- 
tiate between task contribution and 
task questions. Unfortunately, per- 
sonality variables have not been re- 
lated to the frequency of asking 
questions a sufficient number of 
times to be included here. Therefore, 
this section deals exclusively with 
task contributions. 

There is room for doubt, however, 
whether task activity deserves to be 
treated independently of total ac- 
tivity, since correlations as high as 
-93 between the number of task con- 
tributions and the total number of 
acts initiated have been reported 
(Borgatta, 1953). This is hardly sur- 
prising in view of the fact that in 
some studies two-thirds or more of 
the total number of 


acts are task 
contributions. There is no question 
that the operation for determining 
the number of task contributions js 


distinct from the operation for meas- 
uring total activity. The issue is 
whether, in the light of the high cor- 
relation between these two measures, 
results based upon task contribu- 
tions are not actually misleading, 
The implication is that one category 
of behavior, task activity, is mean- 
ingfully related to some personality 
variable. But if task activity plus 
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nontask activity is related to the per- 
sonality characteristic in a nearly 
identical fashion, what is the value of 
the categorization? Total activity 
rate accounts for both relationships 
more parsimoniously. 

Two researchers, Borgatta (1953) 
and Slater (1955a) were aware of 
such difficulties. Arguing that an 
individual’s task-relevant behavior 
should be considered independently 
of his total activity rate, they meas- 
ured task activity by taking the 
percentage of his total activity which 
fell within the task contribution area. 
It is possible to use their data to 
examine the relationship between 
personality characteristics and task 
activity, controlling for total ac- 
tivity rate. In fact, the best argu- 
ment for including a separate sec- 
tion in this review devoted to task 
activity is that Borgatta and Slater’s 
Percentage data raise a separate is- 
sue. Their data provide an estimate 
of the relation between an individ- 
ual’s personality characteristics and 
the extent to which he concentrates 


his activity in the task contribution 
area. 


Table 5 presents a 
results relating 
activity, 
of intellige 
culinity t 


summary of the 
personality to task 
In the three relationships 
nee, adjustment, and mas- 
© task activity the results 
based upon percentages are shown 
beneath the results based upon the 
raw numbers of task contributions. 

Intelligence. Whereas 80% of the 
results relating intelligence to the 
raw number of task contributions are 
in the positive direction, only 23% of 
the results relating intelligence to the 
Percentage of task contributions are 
Positive. Apparently, the finding 
based upon raw numbers is highly de- 
pendent on the correlation between 
ber of task contributions 


and the total amount of activity. As 


ee 
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TABLE 5 
Tue RELATIONSHIP BETWEEN PERSONALITY Factors AND Task ACTIVITY 
No. No. Positive Negative Positive % Sig. & 
Personality _of of FID se ee 
Factors Stud- Re Sig. N.S. Unt. Sig NS. Unt. Ghiy A %of of Trend 
eas fGen Oe OL, O O oe) a i) sieo) Œ) 
Intelligence 
raw 4 45 13 23 1 8 0) o 80 93 29 
as as) (45) 
% ots 0 3 0 10 o 0 23 — 
(13) (0) (13) 
Adjustment 
raw 4 19 7 10 o 2 0 o 9 100 37 
(19) (7) (19) 
% 2 20 1 7 0 12 0 0 40 100 
(20) u) (20) 
Extroversion 4 19 6 9 0 0 3 0 1 83 00. 32 
(18) (6) (19) 
Dominance 3 8 3 3 0 0 2 0 0 715, 100 38 
(8) (3) (8) 
Masculinity 
raw 1- 21 5 12 o 2 2 (0 81 71 24 
(21) (7) (21) 
% 1 3 o 1 0 0) 2 0 33 — 
(3) (0) (3) 
Conservatism 4 12 3 3 0 2 2 0 60 60 30 
qo) (5) (10) 


noted above, the higher the indi- 
vidual’s intelligence, the more likely 
he is to be a high participator. Since 
the total number of acts initiated and 
the number of task contributions ini- 
tiated are almost perfectly correlated, 
a positive association between intelli- 
gence and the raw number of task 
contributions could have been pre- 
dicted. However, the negative rela- 
tionship between intelligence and 
concentration of activity in the task 
area was unexpected; 10 out of 13 
correlations are in the negative direc- 
tion. It appears that although in- 
telligent individuals talk more than 
less intelligent individuals, they con- 
centrate less of their total activity in 
the area of task contributions (Bor- 
gatta, 1953; Carter & Nixon, 1949; 
Cattell & Stice, 1954; Miller, 1939; 
Slater, 1955a). 

Adjustment. The relationship be- 
tween personal adjustment and task 
activity depends upon the contribu- 
tion of total activity rate to the re- 
sults. When the raw number of task 


contributions is employed, the rela- 
tionship is positive; when task ac- 
tivity is measured in terms of the per- 
centage of total activity, the relation- 
ship fails to hold. Actually, the 
trend of the results is slightly nega- 
tive, but the only significant result is 
positive. When the factor of total 
activity rate is controlled, the strong 
positive relationship between adjust- 
ment and task activity is reduced to 
a low negative relationship (Bor- 
gatta, 1953; Carter & Nixon, 1949; 
Cattell & Stice, 1954; Miller, 1939; 
Slater, 1955a). 

Extroversion, Extroversion is posi- 
tively related to the raw number of 
task contributions, but it is not possi- 
ble to partial out the total activity 
factor underlying these results. It 
may at least be suspected that the 
relationship would be altered if ex- 
troversion were related to task ac- 
tivity, holding activity rate constant 
through statistical controls (Bor- 
gatta, 1953; Carter & Nixon, 1949; 
Cattell & Stice, 1954; Miller, 1939). 
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Dominance. The studies reviewed 
contain only eight results bearing on 
the relationship between dominance 
and the number of task contribu- 
tions. Six of the results are positive 
and the three significant results are 
positive (Borgatta, 1953; Carter & 
Nixon, 1949; Cattell & Stice, 1954). 

Masculinity-femininity. Masculin- 
ity has þeen related to task activity in 
only two studies (Carter & Nixon, 
1949; Slater, 1955a). The one study 
which related masculinity to the raw 
number of task contributions found 
81% of the results’to be positive. On 
the other hand, the study employing 
percentages of total activity as the 
measure of task contributions found 
only one out of three results to be 
positive. The extent to which mas- 
culinity relates to an individual’s 
tendency to concentrate his activity 
in the task area remains largely un- 
known. 

Conservatism. Four studies (Carter 
& Nixon, 1949; Cattell & Stice, 1954; 
Haythorn et al., 1956a, 1956b) have 
examined the relation between con- 
servatism and the amount of task ac- 
tivity initiated. A slight positive 
trend emerges from the data, indicat- 
ing that Conservative or authoritar- 
ian individuals tend to give more task 
contributions than nonauthoritarian 
individuals. However, there are too 
few results to establish this trend as 
significant. 

Summary. A seriou 
derlies the attempts 
sonality variables to 
when the latter is measured in terms 
of the raw number of task contribu- 
tions. The results are not independ- 

ent of the relationship between per- 
sonality variables and total activity 
rate. Adjustment, extroversion, mas- 
culinity, intelligence, dominance, and 
conservatism are all found to be posi- 
tively related to the raw number of 


s difficulty un- 
to relate per- 
task activity 


task contributions, but the relation- 
ships are reversed in the three cases 
where it is possible to control for total 
activity rate by using percentages. 
Intelligence, adjustment, and mascu- 
linity are negatively related to the 
proportion of a man’s total activity 
which falls within the area of task 
contributions. It must be admitted 
that the reversals are not uniformly 
convincing; the pattern of results for 
adjustment vs. task activity is mixed, 
añd there are only three results in the 
case of masculinity. On the other 
hand, for all their faults, these three 
reversals succeed in raising the ques- 
tion of whether personality variables 
may not relate one way to task ac- 
tivity when measures of it are con- 
founded with the general activity 
factor and quite another way to task 


activity when this confounding factor 
is removed, 


Social-Emotional Activity 


There are two general categories of 
social-emotional activity. Positive 
social-emotional activity includes 
showing agreement, tension release, 
and solidarity; negative social-emo- 
tional activity includes showing dis- 
agreement, tension, and antagonism. 
They are treated separately in this 
section. 

Two aspects of personality, 
gence and adjustment, have b 
lated to social-emotional act 
sufficient number of times to y 
detailed analysis. 
essential difficulty 
task activity, the 
total activity fac 
for a segment of 
behavior, is not 
the one hand, 
tween the total n 
negative social- 
the total numb: 
low. On the 


intelli- 
een re- 
ivity a 
arrant 

Fortunately, the 
with the results on 
confounding of the 
tor with the results 
an individual’s total 
a problem here, On 
the correlation be- 
umber of positive or 
emotional acts and 
er of acts initiated is 
other hand, Borgatta 


b7 


= aaron 
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(1953) and Slater (1955a) have re- 
lated intelligence and adjustment to 
the proportion of a man’s total ac- 
tivity which falls within the positive 
or negative social-emotional areas. 
In addition, Borgatta has related in- 
telligence to the raw number of posi- 
tive and negative social-emotional 
acts. Table 6 presents a summary of 
the relationships of intelligence to 
positive and negative social-emo- 
tional activity, measured in terms of 
raw amounts and percentages, and 
then the relationship of adjustment 
to positive and negative social-emo- 
tional activity, measured only in 
terms of percentages of the total 
amount of activity. 

Intelligence. Despite the low num- 
ber of results, there is a trend emerg- 
ing from the data. Intelligence meas- 
ures are positively related to both 
the total number of positive social- 
emotional acts and the percentage of 
total activity falling in this area. On 
the other hand, intelligence is nega- 
tively related to the two correspond- 
ing measures of negative social-emo- 
tional activity. Controlling for total 
activity by the use of percentages 
does not disturb the trends. In com- 


parison with less intelligent group 
members, the more intelligent indi- 
viduals appear to concentrate more 
of their behavior in the area of posi- 
tive social-emotional activity and less 
in the area of negative social-emo- 
tional activity (Borgatta, 1954; 
Slater, 1955a). 

Adjustment. The individual’s per- 
sonal adjustment is positively related 
to the proportion of his total activity 
which is rewarding or supportive. 
The trend is not very strong, with 
only 59% of the results in the positive 
direction, but when contrasted with 
the relationship between adjustment 
and negative social-emotional ac- 
tivity, the pattern is interesting. Ad- 
justment is negatively related to the 
proportion. of a man’s total activity 
in the negative social-emotional area. 
On the basis of these data, it appears 
that the better adjusted the individ- 
ual, the more likely he is to initiate 
positive social-emotional acts and the 
less likely he is to initiate negative 
social-emotional acts (Borgatta, 
1954; Slater, 1955a). 

Summary. Social-emotional ac- 
tivity has received less attention 
from researchers than any other as- 


TABLE 6 


Tue RELATIONSHIP BETWEEN PERSONALITY Factors AND SOCIAL- 


EMOTIONAL Activity 


Negative TS, Positive ASiga 


k No. No. Positive 
Personality of of INS g 
Factors Stud- Re- Sig. N.S. Unt. Sig. N.S. Unt. ( ay & % of of Trend 
a R aL 2 (Cy eC OMe ste O) SEG (k) 
Intelligence 
positive 
raw ih 5 2 10 1 0 0 92 100 B) 
% 2 Tats 0 11 2 0 0 GD) E f 
r a3) 0% (13) 
negative 0 
Ou) AsO ee 
raw A 0 6 0 7 0 d 9 © a3) 
% 29 is 4 9 (o 0 a3) w (13) 
Adjustment 5 
positive % 2 22 PAND o O a DT TAA ) (22) 
vie e 0 
negative % 2 22 o 7 0 15 Q T (22) 
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of an individual’s performance 
P dea in this review. The scanty 
evidence ayailable indicates that the 
more intelligent or the better ad- 
justed the individual, the more 
likely he is to concentrate his ac- 
tivity in the positive social-emotional 
area, and the less likely he is to con- 
centrate his activity in the negative 
social-emotional area. 


Conformity 


Beginning with Asch’s (1951) in- 
genious experiment on conformity 
and his suggestions about possible 
personality differences between inde- 
pendent and yielding subjects, a 
number of researchers have been con- 
cerned with the problem of relating 
an individual’s personality to his 
tendency to conform to the opinions 
of others. One special problem arises 
in reviewing the results in this area; a 
considerable number of the results 
depend upon personality measure- 
ments which ask the individual to 
describe himself. While other sec- 
tions of this review contain results 
based upon self-ratings, those results 
have not created any difficulty. In 
the first place, they have always been 
relatively few in number, and, in the 
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second place, the trends of results de- 
pending upon self-ratings have been 
in close agreement with the trends 
based upon other techniques of meas- 
urement. However, self-ratings on 
adjustment and extroversion do not 
relate to conformity in the same way 
as peer ratings, questionnaires, and 
objective tests. The trends for re- 
sults based upon self-ratings must be 
treated separately in order to obtain 
a valid summary of the findings. A 
further complication, introduced by 
the use of adjective check-lists for 
self-rating, is the tendency for 
authors, under pressure to remain 
brief, to report only the adjectives 
which differentiate conformers from 
nonconformers at some specified 
level of significance. As a result, any 
estimate of the significance of the 
findings is inflated. Moreover, it is 
not possible in the case of those ad- 
jectives which do not yield significant 
results to assess the direction of the 
relationship. Only the trends relating 

ominance and conservatism to con- 
formity are sufficiently free of results 
based upon self-ratings to escape 
these criticisms. 

Table 7 presents a summary of the 
relationships of adjustment, extro- 


TABLE 7 
THE RELATIONSHIP BETWEEN PERSONALITY FACTORS AND Conrormity 
No. No. Positi i 

Personality an ai . ‘ositive Negative TRN Positive a ee 
Factors tud- Re- ig. N.S. Unt. Sig. a N.S. E Trene 
mo A O o G O Goen g gg oTe 

Adjustment 
self-rating 2 18 13 1 0 3 0 0 78 93 72 
other techn, 8 30 2 1 2 4 1 @ u o m da 
a6) 6) (24) 

Extroversion 
self-rating 2 16 10 1 0 1 4 0 69 91 62 
other techn. 5 10 0 0 2 1 1 2 a9) ay a 
(6) (1) (8) 
Dominance 4 8 0 0 2 2 1 2 1 29 0 33 
(7) (2) (6) 
Conservatism 6 20 16 3 0 o 1 0 0 95 100 80 
(20) (16) (20) 
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version, dominance, and conserva- 
tism to conforming behavior. For 
the first two relationships the results 
for self-rating techniques are pre- 
sented separately from those using 
other techniques of personality as- 
sessment. 

Adjustment. Those individuals 
who tend to conform to group opin- 
ion also tend to see themselves as 
better adjusted, as indicated by the 
fact that 78% of the results are posi- 
tive. However, peer ratings and per- 
sonality inventories do not yield the 
same results; only 31% of the results 
relating adjustment, as measured by 
techniques other than self-ratings, to 
conformity are positive. One way to 
resolve the dilemma is to assume that 
self-ratings do not measure personal- 
ity as validly as the other measures. 
It may be, for example, that those in- 
dividuals who tend to conform to 
group opinion also tend to conform 


to an acceptable personality charac- , 


terization in their self-descriptions. 
If only the results based upon tech- 
niques other than self-ratings were 
considered, it would be possible to 
conclude that well-adjusted indi- 
viduals are less likely to conform to 
the opinions of others (Barron, 1953; 
Bray, 1950; Cervin, 1955, 1957; 
Hardy, 1954; Hollander, 1954; Kagan 
& Mussen, 1956; Kelman, 1950). 
Extroversion-introversion. Extro- 
versionis positively related toconform- 
ity in 78% of the results employing 
self-ratings; those who conform to 
the opinion of others describe them- 
selves as kind, friendly, helpful, and 
optimistic. However, the results em- 
ploying projective and personality 
inventory variables do not confirm 
this relationship. While there is some 
evidence that extroversion is nega- 
tively related to conformity, a more 
accurate summary would be that the 
few relevant findings are inconclu- 


sive (Barron, 1953; Bray, 1950; 
Hardy, 1954; Hoffman, 1953; Kel- 
man, 1950). 

Dominance. Although only eight 
findings bear on the relationship be- 
tween dominance and conformity, 
the trend is negative. Only one of 
these results is based upon self-rat- 
ings. It might be concluded that 
those who yield to group pressures 
are less dominant individuals (Bar- 
ron, 1953; Bray, 1950; Hoffman, 
1953; Kelman, 1950). 

Conservatism. There has been con- 
siderable speculation that there is a 
positive association between con- 
servatism and conforming behavior. 
The burden of evidence suggests that 
conservative, conventional, and au- 
thoritarian subjects are more likely 
to yield to group pressure than radi- 
cal or unconventional subjects. No 
single measure of the conservatism 
dimension emerges as an especially 
potent predictor of conformity in all 
conditions; in fact, there is a sugges- 
tion that it is important to control 
for a number of conditions if the rela- 
tionship is to hold at all. There is 
some indication in these data that 
authoritarian subjects are less likely 
to conform to a small group of peers 
but more likely to conform to either 
a large group of peers or perceived 
superiors (Barron, 1953; Bray, 1950; 
Cervin, 1955, 1957; Hardy, 1954; 
Hollander, 1954; Kagan & Mussen, 
1956; Kelman, 1950). 

Summary. Those who are more 
likely to conform to group opinion 
see themselves as better adjusted and 
more extroverted. Only in the pre- 
diction of conformity do personality 
measures other than self-ratings con- 
tradict the results based on self- 
description. By such measures as 
ratings by others, projective tests, 
and personality inventories, adjust- 
ment and extroversion are nega- 
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i related to conformity. In addi- 

eas isa slight indication that 
dominance is negatively related to 
conformity. The relationship be- 
tween conservatism and conformity 
has received considerable attention, 
and the data tend to confirm the 
hypothesis that conservative indi- 
viduals are more likely to conform to 
the opinions of others. 


Situational Factors 
< 


A number of relationships between 
personality variables and individual 
behavior and status variables appear 
to be well established. It is possible 
to extend the analysis one step fur- 
ther and inquire whether the magni- 
tude of these relationships varies 
under certain conditions. This sec- 
tion examines the extent to which 
situational factors affect the relation- 
ships between personality character- 
istics and performance in groups, in 
an effort to understand the limiting 
conditions within which the relation- 
ships operate. 


A situational facto 


Si r represents a 
condition of rese 


0 arch about which 
some decision must be made, but 


which, once the decision is made, 
may affect the generality of the re- 
sults. Four examples, for which ade- 
quate data are available in these 
studies, are selected for analysis in 
this section: (a) the nature of the pop- 
ulation from which the sample is 
drawn; (b) the sex of the group mem- 
bers; (c) the history of the group 
prior to observation; and (d) the size 
of the group. The first example con- 
trasts four populations; these are 
high school students, college under- 
graduates, military personnel, and 
all other adults. The second con- 
_ trasts groups composed entirely of 
males with those entirely of females; 
mixed groups are omitted from the 
analysis of situational factors. The 
third example compares ad hoc ex- 
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perimental groups with natural, on- 
going groups. In the majority of 
cases, the former are composed of 
persons unknown to each other be- 
fore the period of observed interac- 
tion, and usually the group exists 
entirely for the purposes of research. 
In contrast, natural groups are com- 
posed of acquaintances and exist for 
many reasons other than to be 
studied. The fourth example com- 
pares groups made up of seven or 
fewer members with larger groups of 
eight or more. In treating each situa- 
tional factor the effort will be to de- 
termine whether the relationship be- 
tween the individual’s personality 
and his performance in the group re- 
mains constant across the various 
conditions. 

There are sufficient data to study 
only the relationships of intelligence, 
adjustment, and extroversion to one 
dependent variable, leadership. Com- 
parisons are made between conditions 
of research by determining, for each 
relationship, the percentage of re- 
sults within each study which are 
Positive; these percentages are then 
averaged over studies employing the 
same conditions of research. The ad- 
vantage of averaging the proportion 
within separate studies is that each 
study is thus weighted equally in the 
final statistic, the percentage of posi- 
tive results. It may be remembered 
that intelligence, adjustment, and ex- 
troversion all bear a strong positive 
relationship to leadership; the ques- 
tion is whether the strength of these 
relationships varies with the condi- 


tions of research under which they 
are obtained, 


The nature of the Population. Over 


two-thirds of studies relating intelli- 
gence, adjustment, or extroversion 
to leadership draw their groups from 
the college Population. One conse- 
quence of such a sampling bias is that 
relatively little data can be found to 
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determine if, as a consequence, a dis- 
torted view of the relationships in- 
volved has been created. There ap- 
pears to be little variation in the rela- 
tion of intelligence to leadership 
across populations; the proportion of 
positive results in military groups is 
89% and in adult groups, 100%. 
However, adjustment is more 
strongly related to leadership in 
undergraduate and adult groups than 
in high school and military groups, 
the proportions varying from nearly 
90% in the former to nearly 70% in 
the latter. These differences may re- 
flect the influence of a number of 
underlying factors; for example, it 
may be that as the age, education, or 
social class of the group members 
increases, adjustment is more strongly 
related to leadership. Finally, extro- 
version is less strongly related to 
leadership among high school stud- 
ents (61% positive results) than 
among the three remaining popula- 
tions (over 80% positive results). 

The sex of the group. The number 
of studies which examine all-male 
groups is almost identical with the 
number examining all-female groups. 
There is little difference between male 
and female groups in the way intelli- 
gence relates to leadership; 89% of 
the results are positive in the male 
groups and 86% are positive in the 
female groups. Relating adjustment 
to leadership, 83% of the results are 
positive in male groups, 62% are 
positive in female groups. In con- 
trast, 69% of the results are positive 
when extroversion is related to lead- 
ership in male groups, while 85% are 
positive in female groups. Thus, ad- 
justment is more positively related 
to leadership in male groups than in 
female groups, whereas extroversion 
is more positively related to leader- 
ship in female groups. 

History of the group. When groups 
which have interacted prior to ob- 
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servation are compared with groups 
without prior interaction, no differ- 
ences of any magnitude emerge in the 
relationships of intelligence, adjust- 
ment, and extroversion to leadership. 
The differences between the percent- 
ages of positive results in experi- 
mental and natural groups do not ex- 
ceed 4%. Apparently the way these 
three aspects of personality relate to 
leadership status does not vary as a 
result of studying either experi- 
mental or natural groups. 

Size of the group. In studying the 
effect of variations in group size, the 
necessity of holding constant an- 
other condition of research, the 
history of the group, requires that 
only experimental groups be consid- 
ered here since in these studies nat- 
ural groups are in every case larger 
than experimental groups. The ex- 
perimental groups range in size from 
three to 10 members. Dividing them 
at the median size, seven and one- 
half, it appears that size alone does 
not strongly affect the relationships. 
However, the slight trend is intri- 
guing. Intelligence is more strongly 
related to leadership in smaller groups 
than in larger groups, while adjust- 
ment is more strongly related to lead- 
ership in larger groups. It may be 
hypothesized that, at least within the 
range of group size considered here, 
as the size of the group increases, in- 
ternal and integrative problems be- 
come more important, relative to the 
external or adaptive problems of the 
group. It may be that the need for 
an integrative leader in the larger 
groups makes it more likely that a 
well-adjusted individual will be se- 
lected, whereas the greater need for 
an adaptive leader in the smaller 
groups makes the intelligent indi- 
vidual more likely to be chosen. 
These differences are only matters of 
degree, since the relationships are pos- 
itive in both large and small groups: 
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Summary. Four decisions which 
must be made before any research 
can be undertaken concern the na- 
ture of the population, the sex of the 
group members, the previous his- 
tory and the size of the groups. Of 
these, only the previous history ap- 
pears to make no difference in the 
way intelligence, adjustment, and ex- 
troversion are related to leadership 
status. To a slight extent, intelli- 
gence is more positively related to 
leadership in smaller experimental 
groups than in larger ones, Adjust- 
ment is more positively related to 
leadership in undergraduate and 
adult groups, in male groups, and in 
larger groups, while it is less strongly 
related to leadership in high school 
and military groups, in female 
groups, and in smaller groups. Extro- 
version is less strongly related to 
leadership in high school groups than 
in college, military, or adult groups, 
and it is more strongly related to 

leadership in female groups than in 

male groups. However, two points 
should be emphasized, The strong 
positive relationships between these 
three aspects of personality and lead- 
ership status are not reversed under 
any variation in the conditions of re- 
search; the differences are only a 


matter of degree and many of the dif- 
ferences are slight, 


Summary and Conclusion 


This review has examined a num- 
ber of relationships between the per- 
sonality characteristics of the indi- 
vidual and the way he behaves or js 
perceived in groups. Seven aspects of 
personality were selected for study; 
all but one were chosen on the 
grounds that factor analytic studies 
of personality had repeatedly demon- 
strated their importance. Roughly 
350 out of over 500 different person- 
ality variables were then categorized 


as measures of the seven dimensions. 
Six aspects of the behavior and status 
of the individual were selected, pri- 
marily on the basis of the labels and 
operations of the measures. The rele- 
vant findings on the 29 relationships 
for which adequate data were avail- 
able were then examined. Finally, an 
examination was made of the effect of 
four situational factors or conditions 
of research on three of the relation- 
ships. 

Any attempt to evaluate the con- 
clusiveness of this review should 
take a number of considerations into 
account. To the extent that contra- 
dictory or insignificant findings have 
been either overlooked by the re- 
viewer or unpublished by the re- 
searcher, the trends may be inaccu- 
rate. A considerable number of un- 
published data are included in this 
review, but the impact of the remain- 
ing unpublished data cannot be 
known. Secondly, the selection of 
factors and especially the location of 
variables within the factors involved 
decisions which future research may 
prove ill-advised, Every effort was 
made to use evidence other than the 
author's original label for a variable, 
but the present state of knowledge in 
the field of personality assessment 
leaves much to be desired. Finally, 
the decision to examine the total 
pool of results for each 


I t relationship 
involved a risk that the statistical 
interdependence of the measures 


would bias the results, 
Proportion of results w 
tive for any relations 
crease or decrease 
pendent sam 
mined, 
The best 
ual’s perfor 
gence. In 
positive re 
to be posi 


Whether the 
hich are posi- 
hip would in- 
if based upon inde- 
ples cannot be deter- 


predictor of an individ- 
mance in groups is intelli- 
order of the proportion of 
sults, intelligence is found 
tively related to total ac- 
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tivity rate, leadership, and popu- 
larity. In addition, it is positively re- 
lated to the number of task contribu- 
tions made by an individual, but, 
controlling for the total activity rate, 
it is negatively related to the propor- 
tion of his total activity falling in the 
task contribution area. Intelligence 
is positively related to both the 
amount and proportion of positive 
social-emotional activity, whereas it 
is negatively related to negative 
social-emotional activity. 

Adjustment is found to be posi- 
tively related to leadership, popu- 
larity, and total activity rate, in that 
order. It is positively related to the 
total number of task contributions, 
but negatively related to the per- 
centage of the total number of acts 
which are task contributions. Ad- 
justment is positively related to posi- 
tive social-emotional activity and 
negatively related to negative social- 
emotional activity. Although sub- 
jects who tend to conform to group 
opinion see themselves as better ad- 
justed, other measuring techniques 
indicate a negative relationship be- 
tween adjustment and conformity. 
Except for the fact that intelligence 
has not been related to conformity a 
sufficient number of times to be re- 
viewed, adjustment is related to be- 
havior and status variables in much 
the same way as intelligence. 

Extroversion is positively related 
to popularity, total activity rate, and 
leadership. Although it is positively 
related to the total number of task 
contributions, this result cannot be 
considered as independent of the re- 
lation between extroversion and total 
activity rate. While individuals who 
conform more than others to group 
opinion tend to see themselves as 
more extroverted, techniques other 
than self-ratings fail to show any sig- 
nificant association. 


Dominance is positively related to 
the total number of task contribu- 
tions initiated and to leadership. It 
is negatively related to an indi- 
vidual’s tendency to conform to 
group opinion. 

Masculinity bears a low positive 
relationship to leadership and popu- 
larity. A positive association is found 
between masculinity and the total 
number of task contributions, al- 
though there is some slight indica- 
tion that this relationship is reversed 
when the total activity factor is con- 
trolled. 

Conservatism is negatively related 
to leadership, but positively related 
to popularity. A positive association 
is found between conservatism and 
the raw number of task contribu- 
tions. Finally, conservatism is posi- 
tively related to conformity; those 
who tend to conform to the opinions 
of others are more conservative or 
authoritarian. 

The measures of interpersonal 
sensitivity relate positively to both 
leadership and popularity. Both rela- 
tionships are of low magnitude, and, 
in the case of the latter, there is a 
possibility that the number of the re- 
sults which are spuriously positive is 
sufficient to cast doubt on the whole 
trend. 

In reviewing the results for the 
seven aspects of personality, refer- 
ence has been made to the percentage 
of the total number of results which 
are positive. In the majority of cases, 
when only the significant results are 
examined, the strength of the trend 
increases. Moreover, several rela- 
tionships have been examined in a 
sufficient number of studies to permit 
the use of the trend for a single study 
as the unit of research; in each case 
the proportion of studies yielding 
positive trends was significantly dif- 
ferent from chance. 
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A number of conditions of research 
are found to influence the relation- 
ship of intelligence, adjustment, and 
extroversion to leadership. When 
different techniques of measuring 
leadership, different populations, and 
groups of different sizes are used in 
the research, the relationships are al- 
tered. It appears to make no differ- 
ence to the relationship obtained 
whether the group members were 
acquainted with one another before 
the period of observation and meas- 
urement. It should be noted that in 
no case was the positive relationship 
of intelligence, adjustment, and ex- 
troversion to leadership reversed in 
direction. 

Throughout this review the results 
have been used to examine the direc- 
tion of the various associations be- 
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H. M. Some intra- 


tween personality characteristics and 


measures of behavior or status. Occa- 


sionally, however, the results con- 
tained a sufficient number of correla- 
tions to afford some estimate of the 
magnitude of the relationship. In no 
case is the median correlation be- 
tween an aspect of personality cov- 
ered here and performance higher 
than .25, and most of the median cor- 
relations are closer to .15. 

In conclusion, it may be noted that 
the relationships reviewed are by no 
means the only ones to which atten- 
tion has been and should be directed. 
It is encouraging to note, however, 
that many clear and significant 
trends emerge when the body of re- 


search on these relationships is con- 
sidered as a whole. 
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In recent years, studies of cerebral 
organization in man have directed 
attention to the differential localiza- 
tion of learning and memory func- 
tions in the cerebrum (Milner, 1954, 
1958). Lashley’s (1929) early argu- 
ments in favor of mass action, though 
tacitly accepted, are seen to have lim- 
ited generality because they are 
based upon the effects of cortical ab- 
lations restricted almost exclusively 
to the rat. It has become increasingly 
important to coordinate the animal 
data and attempt, once again, a rec- 
onciliation of the differing theoretical 
points of view. Whether the discrep- 
ancies of data can be accounted for 
on the basis of species differences in 
cerebral organization can only be 
judged on the basis of a systematic 
comparative study. Intermediate 
species have been neglected and the 
theoretical neurological issues hinge 
upon data largely restricted to rat 
and man. Only in the past decade 
has there been seen a great spurt of 
research on monkeys as an intermedi- 
ate phylogenetic group. 


Lashley’s main argument in sup- 


` port of his theory of mass action has 


been the alleged nonspecific function- 


1 The experimental work reported herein 
was carried out at the Yerkes Laboratories 
of Primate Biology and at the Montreal 
Neurological Institute during tenure of a 
National Research Council Fellowship in 
the Medical Sciences. Preparation of the 
review was supported in part by research 
grant M1442 from the National Institute of 
Mental Health, Public Health Service, and 
in part from the State of Illinois Mental 
Health Fund Grant 1711. 


ing of sensory and motor tissue 
(1943). Recently, the present writer 
studied the effects of occipital lesions 
in the monkey, with particular em- 
phasis upon the effects on nonyisual 
as well as visual learning and re- 
tention (Orbach, 1955a, 1955b, 1959). 
These data will be presented in an 
attempt to fill the gap between rat 
and man. The review will focus on 
the anatomy and functional organ- 
ization of striate cortex in the mon- 
key. This will be followed by an 
evaluation of the comparative data 
on destruction of striate cortex in 
rat and other infraprimate species, 
monkey, and man. The core of the 
paper will review the nonvisual ef- 
fects of striate cortex lesions, but 
there will be included sufficient refer- 
ence to visual sequelae of the lesions 
to place the nonvisual core in pêr- 
spective. It will be concluded on the 
basis of these data that the theoreti- 
cal neurological issue devolves not 
simply upon species differences in 
functional organization of cortical 
tissue. Rather, the fundamental 
problem in the way of a reconcilia- 
tion of the differing points of view 
lies in a neglect of a precise definition 
of ‘psychological function.” 


Identification of Striate Cortex 


The cortical sector of the primate 
occipital lobe, known as the area 
striata (Elliot Smith), is considered 
one of the most highly differentiated 
regions of the entire cerebral cortex. 
The white stripe of Gennari, con- 


spicuous to the unaided eye even 1n 
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unstained tissue, is a remarkably re- 
liable criterion in its identification. 
Moreover, the striate area shows the 
best laminar differentiation of any 
cortical region (Bonin, 1942). Asa 
result, it is sharply delimited from 
the adjacent peristriate cortex. 

Nissl-stained material reveals that 
striate cortex is the most pronounced 
example of konio-cortex; it contains 
small cells throughout with few ex- 
ceptions and displays a broad inner 
granular layer, at least one-quarter 
the thickness of the cortex. In the 
monkey, the second and third layers 
are almost indistinguishable. Ac- 
cording to Bonin, the fourth layer is 
divided into two parts, an outer 
lighter stratum consisting mostly of 
granular cells, and an inner denser 
stratum. It contains the telodendria 
of specific afferents from the lateral 
geniculate bodies. The organization 
of the fourth layer with its outer band 
of Baillarger (the line of Gennari) 
makes its appearance so unique. The 
fifth stratum, the lightest of this area, 
consists of a sparse population of 
small pyramidal cells and occasional 
solitary cells of Meynert. The sixth 
layer forms a conspicuous band at 
the bottom of the cortex. It consists 
of the usual fusiform cells and gran- 
ules as well as solitary cells of Mey- 
nert. The reader is referred to Bonin 
(1942) for histological details. 

The surface topography of the 
brain of Macaca mulatta provides re- 
liable landmarks of the boundaries of 
striate cortex. The lunate and infe- 
rior occipital sulci represent the ante- 
rior and ventral margins of the lateral 
(opercular) striate cortex which ex- 
tends to within a millimeter or two of 
these sulci. There may be an acces- 
sory lateral calcarine sulcus but it is 
rarely more than superficial, in con- 
trast to the deep sulcus or sulcus- 
complex in the brain of Ateles which 
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contains almost one-quarter of the 
total striate cortex (Lashley, 1948). 
Most of the medial striate tissue is 
buried in the calcarine fissure and its 
superior and inferior branches. 

The distribution of striate cortex 
in relation to the remaining neocortex 
of one macaque hemisphere is illus- 
trated more precisely in Fig. 1. The 
surface area of the cortex for each sec- 
tion was plotted in arbitrary units. 
In this hemisphere, total striate cor- 
tex was 10.6% of the total neocortex; 
its’ surface area was estimated as 575 
mm.? The tissue was distributed over 
the occipital region as follows: 


Lateral surface (from dor- 
sal convexity to inferior 


margin) 34.6% 
Calcarine fissure and ` 

branches 48.4% 
Medial surface (extra-cal- 

carine cortex) 17.0% 


No correction was applied for dif- 
ferential shrinkage and distortion. 
These figures agree very closely with 
those reported by Filimonoff (1933) 
for cercopithecus, Bonin (1942) for 
cebus, Lashley and Clark (1946), 
Lashley (1948), and Chow, Blum, 
and Blum (1951) for spider monkey 
and macaque. 

Though defined cytoarchitectur- 
ally, the striate cortex has also been 
studied as: 

1. An afferent projection area 
where fibers originating in the lateral 
geniculate bodies terminate in an ac- 
curate point for point fashion (Polyak 
[1932], using Marchi technique and 
Walker [1938] using retrograde de- 
generation technique.) 

2. A cortico-fugal projection area 
sending fibers to the adjoining cortex 
(Bonin, Garol, & McCullough, 1942; 
Clark, 1941) and to subcortical 
nuclei such as the lateral geniculate 


| a 


_ 
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bodies and nuclei of the eye muscles 
(Mettler, 1935). 

. 3. The site of local action poten- 
tials evoked by photic stimulation of 


PERIMETER. 


Striate 


g Cortex 


50 
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4. An electro-stimulable area, stim- 
ulation of which produces predictable 
eye movements in the monkey 
(Crosby & Henderson, 1948; Walker 


60 70 80 90 


SECTION NUMBER 


Fic. 1. CORTICAL PERIMETER IN ARB 
MACAQUE HEMISPHERE. Perimeter wa: 
these 11 sections are accordingly omitte 


circumscribed portions of the retina. 
Macular cortex only has been ex- 
plored and mapped in the monkey 
(Marshall & Talbot, 1942). The ex- 
tent to which single units in striate 
cortex respond to photic bombard- 
ment is still to be explored systemati- 


eally. 


ITRARY UNITS FOR SERIAL CORO! 
s relatively consta: 
d from the figure. 


NAL SECTIONS OF ONE 
nt between sections 37 and 48, and 


& Weaver, 
visual impressions 1 
tients (Foerster, 1 


Jasper, 1954). k 
These studies are incomplete and 


it is not yet possible to state with cer- 
tainty whether all these methods 
trace boundaries which coincide. 
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However, no discrepancies have been 
reported and the available data sug- 
gest that all parts of striate cortex 
are structurally similar and partici- 
pate in the same function, namely, 
vision. 


Subdivisions of Occipital Cortex 


Elaborate regional subdivisions of 
striate cortex have occasionally ap- 
peared in the literature (Beck, 1934; 
Ngowyang, 1934; Solnitzky & Har- 
mon, 1946). One recent study by 
Solnitzky and Harmon attempted to 
relate structural characteristics such 
as thickness differences to habits 
of nocturnality and diurnality. In 
measurements of serial sections of 
their primate material, these workers 
found that the over-all thickness of 
the central sector was 45% greater 
than in the peripheral sector. How- 
ever, the lateral convexity of the 
specimen examined by Lashley and 
Clark (1946) was only 17% thicker 
than the cortex lining the walls of the 
the calcarine fissure. Moreover, it is 
well-recognized that cortex within a 
fissure tends to be thinner, probably 
as a result of mechanical forces alone. 

Another factor which contributes 
to the large apparent differences in 
cortical thickness is the angle of cut 
of the tissue. Coronal sections cut 
through the lateral occipital surface 
at a sharp angle, and at the same 
time through the medial surface more 
perpendicularly. If cortical tissue 
were cut at 45°, for example, the tis- 
sue would appear thicker, by a factor 
of 1/2 or 41%, than tissue equally 
thick but cut perpendicularly. 

On the basis of their review, Lash- 
ley and Clark concluded that “no 
valid evidence for the existence of 
structural and functional subdivi- 
sions of the area, other than those re- 
sulting from the topographic projec- 
tion of the retina, has been pre- 


sented” (1946, p. 278). It seems un- 
likely that differences in over-all 
thickness have any functional signifi- 
cance. 

In contrast, regional differentia- 
tion of striate cortex based on a study 
of topographic projection and physi- 
ologic analysis has been amply con- 
firmed. The opercular cortex in the 
macaque is known to represent the 
region of greatest acuity (the inner 
eight degrees of the retina according 
to Marshall and Talbot [1942]), while 
calcarine cortex, by elimination, is 
regarded as mediating peripheral, 
scotopic vision. 

It is interesting to note that the 
cortical area of representation for the 
peripheral retina is larger than that 
for the macula. In Table 1 are listed 
estimates of the percentages of unde- 
generated neurons in the lateral ge- 
niculate bodies after massive resec- 
tions of striate cortex in two ma- 
caques. The only residual striate cor- 
tex was in the depth of the calcarine 
fissure, the area to which the periph- 
eral retina projects. Table 1 shows 
that the degeneration in the lateral 
geniculate body is more complete 
than expected on the basis of the ex- 
tent of striate cortex removed. These 
figures imply that the projection of a 
lateral geniculate neuron to the cal- 
carine fissure covers a larger surface 
area than the projection to the lateral 
surface. This conclusion was also 
reached by Chow, Blum, and Blum 
(1951) as a result of a different line of 
reasoning. If mass of residual striate 
cortex is substituted for surface area, 
the differences between cortex and 
lateral geniculates in Table 1 are re- 
duced, but the conclusion remains 
unaltered, that cortical representa- 
tion seems not to be proportional to 
use or acuity in the case of genicu- 
late-striate relationships. 

A band of cortex surrounding the 


pe ee et 
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primary visual area (Areas 18 and 19 
of Brodmann; OB and OA of Econ- 
omo; visuopsychic areas of Camp- 
bell) has traditionally been regarded 
as a visual “association” area. Sec- 
ondary disorders such as differential 
loss of color and depth perception, 
and visual agnosia have been as- 
cribed by clinicians to lesions in this 
area. On the one hand, electrical 
stimulation and neuronographic 
studies tend to support this view 
(Bonin et al., 1942; Crosby & Hen- 
derson, 1948); on the other hand, the 
weight of the behavioral evidence of 


some retardation in learning of color, 
shape, and pattern discriminations, 
but their reports lack histological 
verification of lesions. It should be 
noted that none of their animals 
failed to recognize food or other 
familiar objects in the home environ- 
ment. Thus, attempts to reproduce 
experimentally the symptoms de- 
scribed in the clinical literature have 
not been successful. The functions of 
peristriate cortex in the monkey re- 
main to be elucidated. 

In view of its remarkable histologi- 
cal and electrophysiological differen- 


TABLE 1 


RESIDUAL STRIATE CORTEX IN THE DEP 
UNDEGENERATED LATERAL GENICULATE NEURONS 


CoRRESPONDING 


TH OF THE CALCARINE FISSURE AND 


Left Hemisphere 


Right Hemisphere 


Subject Residual Lateral Residual Lateral 
Calcarine Geniculate Calcarine Geniculate 
Striate Cortex Cells Striate Cortex Cells 
B3 2.4 0.58 D9) 1.31 
LC4 0 0 2.8 0.37 


Note.—The entries are expressed as percentages of 575 


the effects of removal of Areas 18 and 
19 in monkeys does not (Chow & 
Hutt, 1953). Lashley (1948) and 
Chow (1951) found virtually no im- 
pairment of visual abilities following 
prestriate cortex lesions. The mon- 
keys were tested pre- and postopera- 
tively on brightness, color, contour, 
and pattern discrimination, but only 
one of 10 monkeys showed retarda- 
tion in relearning differential re- 
sponses to colors and patterns. These 
results do not support the hypothesis 
of a separate visual integrative center 
in prestriate tissue. A similar lack of 
postoperative effect was reported by 
Riopelle et al. (1951) and Meyer et 
al. (1951). In contrast, Ades (1946) 
and Ades and Raab (1949) did find 


mm. and 1,000,000 respectively. 


the remaining neocor 
tex, it is surprising to find claims that 
striate cortex and, by implication, 
peristriate cortex, participate in non- 
specific, nonvisual functions. The 
following sections will attempt to 
show that the behavioral evidence 
strongly suggests nonvisual as well 
as visual functioning of occipital cor- 


tex. 


tiation from 


ORAL EFFECTS OF REMOVAL 
OF STRIATE CORTEX 


Studies of neocortical areas have 
given evidence for a precise struc- 
tural and functional differentiation 
of striate cortex. At the same time, 
there has been revealed for this cere- 
bral area a curious lack of functional 


BEHAVI 
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specificity. The behavioral work sug- 
gesting nonvisual functioning of stri- 
ate cortex has been limited almost 
exclusively to the rat, and this ani- 
mal has proven admirably suited for 
these -ablation studies. However, 
there are dangers in generalizing 
from rat to man; and this had led, 
during the past two decades, to the 
increasing use of infrahuman pri- 
mates, especially monkeys, in similar 
studies of cortical organization. Up 
to the present, little effort has been 
made to coordinate and integrate the 
ablation data available for these spe- 
cies. Though the focus of this review 
is on monkey and rat, selected studies 
of cat, dog, and man will be cited to 
complete the picture. 


Visual Effects 


The early ablation studies by 
Flourens (1824) first established that 
vision depends upon the integrity of 
the cerebral cortex (in pigeons, cats, 
and dogs). With improved surgical 
techniques, more circumscribed corti- 
cal lesions were made and the results 
suggested a coincidence of the ‘visual 
area” with the histologically defined 
area striata (Henschen, 1896; Luciani, 
1884; Munk, 1881; Schafer, 1888; 
Wilbrand, 1890). Controversies in 
the historical development of this 
view are reviewed by Marquis (1934). 

In the rat, it is now well-estab- 
lished that complete destruction of 
striate cortex abolishes the capacity 
for detail vision (Lashley, 1931). Ab- 
lation of any other cortical sector has 
no effect on visual performance. It 
has also been shown for the rat that 
small remnants of intact striate cor- 
tex, tissue representing the binocular 
field, can mediate visual sensitivity to 
details (Lashley, 1939). In regard to 
brightness, the data are less clear. 
The operated rat has no difficulty in 


distinguishing the light and dark 
compartments of the Yerkes box. 
Even with total striate cortex re- 
movals, his error scores do not differ 
from those of the normal rat. But a 
more recent analysis of the effects on 
the preoperatively established light- 
dark habit indicates that, though 
subtotal lesions are ineffective in dis- 
turbing the habit, complete removals 
abolish it (Lashley, 1935b). In gen- 
eral, a restricted sector is responsible 
for sensitivity to visual details, 
whereas any small portion of striate 
cortex can mediate the light-dark 
habit (Lashley, 1935b). In the case 
of detail vision, it is locus and not 
mass that is important; yet mass and 
not locus seems critical for brightness 
vision. 

Similar ablation studies of striate 
cortex have been performed using 
cat, dog, and monkey (Kluver, 1941; 
Marquis, 1934; Marquis & Hilgard, 
1937; Smith, 1937). As in the rat, 
visual discrimination depends upon 
the integrity of striate cortex and 
upon no other cortical region, with 
the exception of temporal lobe tissue 
in the monkey (Milner, 1954). (This 
exception, inferior temporal neocor- 
tex, cannot be implicated in vision as 
a primary projection area [Ades & 
Raab, 1949; Orbach & Fantz, 1958].) 
In both cat and dog, bilateral re- 
moval of the occipital lobes abolishes 
the pre-established light-dark habit 
but not the ability to form this habit 
postoperatively. Though a case may 
be made for the phylogenetic corti- 
calization of visual function (Mar- 
quis, 1934), the available data indi- 
cate no discontinuity with respect to 
striate cortex functioning in the 
mammalian series. The most defini- 
tive study of residual visual capac- 
ities in the monkey is that of Kluver 
(1941). He concluded that monkeys 
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with total striate cortex removals 
were able to discriminate visually 
only on the basis of the total amount 
of light entering the eyes. The same 
conclusion can be drawn from the 
visual conditioning studies. Both 
dog and monkey have been shown 
to learn and retain simple and dif- 
ferential instrumental conditioned re- 
sponses after bilateral striate cortex 
removals (Marquis & Hilgard, 1937; 
Wing & Smith, 1942). 

Four monkeys with massive bilat- 
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ual reflexes, pupillary, optokinetic, 
as well as the palpebral responses and 
spontaneous ocular movements were 
observed in all four monkeys. It was 
not possible to test acuity with the 
usual string tests or depth perception 
in the monkeys with larger lesions. 
Ocular movement was seldom elic- 
ited. 

Formal training and testing re- 
vealed the following: 

1. The preoperatively established 
light-dark habit was lost but could 


TABLE 2 
LIGHT-DARK REACTION 
ae : Preoperative Postoperative 
é Initial Learning Retention Retention 
Subject ee SSS 
Trials Errors Trials Errors Trials Errors 
ol 120 38 30 6 90 21 
H2 124 23 — — 180 60 
B3* 234 92 0 0 424 146 
LC4s 608 212 30 8 843 353 


eral striate cortex removals were 
studied for visual status by the pres- 
ent writer. Two of the monkeys had 
partial striate cortex resections in- 
volving the entire lateral occipital 
cortex and pole. The other two mon- 
keys had more complete striate cor- 
tex resections involving the calcarine 
fissure and all tissue caudad to the 
lunate and parieto-occipital sulci. 
Estimates of undegenerated lateral 
geniculate neurons in these two mon- 
keys are listed in Table 1. 

These monkeys showed pro- 
nounced visual field defects. Vision 
was only occasionally detected in the 
dorsal field in the two monkeys wit 
larger lesions. Nevertheless, the vis- 


ion. Noncorrection technique was 


0% correct within a sessi 
isual cues. The left light, when on, 


light that the right well was baited. 
before they met the criterion. 


be reacquired after considerable 
postoperative training (see Table 2). 
Subject LC4, with no more than a 
tiny remnant of striate cortex in the 
calcarine fissure of one hemisphere, 
and subject B3 reacquired the habit 
only after dark adaptation and train- 
ing in darkness. Since none of the 
monkeys was completely insensitive 
visually, lack of retention is attrib- 
uted to the extensive scotomata to 
which the monkeys had to adjust and 
to a corresponding impairment 1n 
acuity and visual attention. These 
results support Kluver that occipl 


omized monkeys retain 


tally lobect k 4 
the capacity to respond differentially 


to total light entering the eyes. 
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10 20 40 100 500 
EXPOSURE TIME (msec.) 


FıG.2. PERFORMANCEOF COLOR DISCRIMINA- 
TION AS A FUNCTION OF EXPOSURE TIME, FOR 
Two Monkeys, O1 anp H2, Fottowinc Bi- 
LATERAL REMOVAL OF THE OcctprraL OPER- 
cuLuM. Red and green circles of light were 
projected on white plaques. 


2. Color and form discriminations 
could be made by monkeys with re- 
movals restricted to the occipital 
operculum, and responses were effec- 
tive when the stimuli are exposed 
tachistoscopically for as short an in- 
terval as 10 msec. The performance 
curves are shown in Fig. 2. 

The finding that the monkeys with 
more complete striate cortex lesions 
met the criterion of habit acquisition 
only after prolonged dark adaptation 
appears to be related to Kluver’s 
(1941) observations that his oper- 
ated monkeys failed to work ina 
normally lit room and to Smith’s 
(1937) observations that his operated 
cats showed disturbances in the abil- 
ity to discriminate intensity differ- 
ences under conditions of increased 
illumination. Accordingly, dark ad- 
aptation may be an important con- 
dition in the evaluation of the post- 
operative visual capacities of mam- 
mals. 

Early in the present century, Franz 
(1911) reported that monkeys with 
occipital ablations were able to re- 
spond effectively to color cues. It is 
now clear from an examination of his 
anatomical diagrams that the lesions 


spared a large portion of striate cor- 
tex. Subtotal lesions in the monkeys 
of the study reported here also failed 
to result in disturbances in color dis- 
crimination. Despite the destruction 
of macular vision plus scattered pe- 
ripheral field defectsanda correspond- 
ing impairment in acuity, these mon- 
keys retained a remarkable ability to 
solve the visual tasks. Similarly, 
Harlow (1939) has shown that the 
homonymous hemianopia resulting 
from unilateral occipital lobectomy 
is compensated for as the monkey is 
given the opportunity to develop new 
visual fixations and that pattern 
discrimination could be reinstated. 
The clinical reports have empha- 
sized disturbances of attention fol- 
lowing trauma to the occipital region 
(Fuchs, 1920; Gelb & Goldstein, 
1920; Poppelreuter, 1917). To judge 
from the short latency of response 
and rapidity of learning, no such dis- 
turbances appeared in the monkeys 
of the present study. This was 
equally true of the visual and nonvis- 
ual tasks listed below. Added to this, 
the tachistoscopic studies indicated 
no disturbance in “speed of recogni- 
tion” of colors and forms. An impair- 
ment resembling that reported for 
man was not observed in the monkey. 


Nonvisual Effects and the Theory of 
Mass Action 


The view that cortical areas havea 
general as well as a specific function 
was elaborated by Flourens (1824), 
who regarded nervous tissue as form- 
ing a unitary system. This emphasis 
on the psychodynamic inter-related- 
ness of different parts of the brain 
served as a reaction against the view 
of the nonexperimental phrenologists 
and preceded the modern conception 


of “localization of function” in the 
cerebrum, 
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Studies of the maze performance of 
rats with cortical lesions led Lashley 
(1929) to question the generality of 
the localization hypothesis. Though 
his interpretation of mutual facilita- 
tion of parts of the brain has been 
challenged (Hunter, 1930), the em- 
pirical relationship between efficiency 
of performance and extent of cortical 
destruction remains to be integrated 
into a connectionistic framework 
(Lashley, 1931a). The interpretation 
favored by Hunter (1932) is that 
larger lesions encroach upon more 
sensorimotor tissue, thus reducing 
the number of sensory cues available 
to the rat. This interpretation as- 
sumes that mastery of the maze de- 
pends upon the integrity of the vari- 
ous sense modalities. The evidence 
for this assumption is conflicting. 
Watson (1907) found that each of the 
peripheral end organs may be de- 
stroyed one ata time without serious- 
ly disturbing maze learning ability. 
It is well-known, however, that visual 
cues are more critical in the learning 
of the elevated maze as compared 
with the enclosed maze (Tsang, 
1934, 1936). Moreover, animals vary 
among themselves in the extent to 
which they depend upon cues from 
the various sense modalities (Den- 
nis, 1929; Finley, 1941; Krechevsky, 
1935). Further, Honzik (1936) has 
shown that combinations of sensory 
loss have a more deleterious effect 
than unimodality deprivations. (See 
also Casper [1933].) The problem of 
sensory control in the learning of the 
maze has not yet been satisfactorily 
solved. A more relevant question for 
the problem at hand is to what extent 
the established maze habit is sensorily 
controlled. In any case, Hunter's in- 
terpretation is virtually ruled out by 
the evidence that primary sensory 
tissue is involved in more general 


nonspecific functioning. 

The evidence for rat. In his 1929 
monograph, Lashley reported that 
peripheral sensorimotor deprivation 
has a less deleterious effect upon 
maze-running efficiency than does 
corresponding cortical damage of 
sensorimotor tissue. Peripheral de- 
struction of proprioceptive and visual 
mechanisms (Lashley, 1929; Lash- 
ley & Ball, 1929) and motor incoor- 
dination due to noncortical interfer- 
ence (Lashley & McCarthy, 1926) all 
had negligible effects on maze per- 
formance as compared with the ef- 
fects of cortical removals. For exam- 
ple, ocular enucleation alone had lit- 
tle or no effect on the retention of the 
enclosed maze habit. However, sub- 
sequent occipital removals had a pro- 
nounced effect in disrupting the 
habit (Lashley, 1929). This was in- 
terpreted to mean that it was not the 
loss of vision per se or the elimination 
of visual cues alone that produce 
loss of the maze habit in cortically 
blinded rats but that, in addition, an 
associative mechanism must have 
been affected as a result of the de- 
struction of striate tissue. 

There followed a series of investi- 
gations designed to study more fully 
the effects of peripheral and cortical 
sense privation on problem solution 
in the rat. It is convenient to review 
these studies under the following 
headings: 1. type of task, 2. post- 
operative initial learning and reten- 
tion, 3. peripheral plus central de- 
struction, 4. age, 5. extent of lesion. 

1. Type of task. Cortical insult 
has resulted in more pronounced de- 
terioration as compared with periph- 
eral sense privation for the enclosed 
maze (Lashley, 1943; Tsang, 1934, 
1936), the elevated maze (Tsang, 
1934, 1936), and, less strikingly, for 
the latch box with latches of a pat 
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ticular complexity (Lashley, 1935). 
The maze pattern used was always 
Lashley’s Maze 3 (1929). These are 
all tasks which presumably involve a 
multiplicity of sensory cues. Pickett 
(1952), using a unimodality maze 
(kinesthetically cued), was unable to 
demonstrate any effects attributable 
to posterior lesions. Similarly, simple 
discrimination habits based on reac- 
tion to light and darkness are un- 
affected by extrastriate cortex lesions 
(Lashley, 1929). When light and 
darkness serve as cues at successive 
choice points of a maze, however, the 
established habit is abolished by 
striate cortex lesions (Ghiselli, 1938). 
Thus, it appears unlikely that only 
multiple-cued tasks are affected by 
cortical destruction at various loci. 

Using a closed field “intelligence 
test” for rats, Lansdell (1953) re- 
ported marked effects of posterior le- 
sions and no effects following anterior 
lesions. Performance on this task ap- 
pears to be visually guided and the 
deterioration can be attributed solely 
to impaired vision. 

2. Postoperative initial learning 
and retention. The relationship be- 
tween efficiency of performance and 
size of lesion was found to hold for 
postoperative learning as well as 
for postoperative retention (Ghiselli, 
1938; Lashley 1929), Accordingly, 
both learning and retention should be 
affected to a greater extent by corti- 
cal removal than by corresponding 
peripheral sense privation. In gen- 
eral, this has been found to be true. 
Lashley has demonstrated habit loss 
for enclosed maze-running (1929, 
1943), and learning impairment for 
latch box solution (1935). Tsang 
(1934, 1936) has demonstrated both 
habit loss and learning impairment 
for the enclosed and elevated mazes. 
In most cases, the effect appears to 
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be more pronounced for postopera- 
tive retention. (Compare Lashley 
[1943] and Tsang [1934].) 

3. Peripheral plus central destruc- 
tion. Though peripheral blinding ap- 
pears to have a negligible effect on 
enclosed maze performance, the ef- 
fect of combined peripheral plus cor- 
tical blinding is more pronounced 
than the effect of cortical blinding 
alone (Tsang, 1934). The interpre- 
tation for this is obscure. One possi- 
bility is that cortically blinded rats 
are more dependent upon visual cues 
which their residual brightness vision 
affords (general directional cues, for 
example) than are normal or enucle- 
ated rats. 

4. Age. Ocular enucleation just 
after birth again appears to have a 
negligible effect on enclosed maze 
proficiency when the rats are tested 
some months later. Subsequent stri- 
ate cortex removals produce the ex- 
pected deterioration (Tsang, 1936). 
These results were used as an argu- 
ment against the interpretation that 
striate cortex mediates a system of 
visual habits built up during the pro- 
longed visual experience preceding 
the experiment and that these habits 
serve as a frame of reference for spa- 
tial orientation. 

Tsang has shown that occipital le- 
sions produced just after birth lead to 
less pronounced maze-running defi- 
ciency than similar lesions produced 
in the mature rat (Tsang, 1937). It 
is beyond the scope of this review to 
emphasize the theoretical importance 
of these data. (See Hebb [1949] for a 
treatment of the age factor.) 

5. Extent of lesion. In view of the 
complex topography of the occipital 
region of the rat, it is virtually im- 
possible to produce large lesions re- 
stricted to striate cortex alone. Most 
prone to accidental damage are the 
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adjacent auditory cortex, the retro- 
splenial areas, the underlying hippo- 
campal lobes and subicular areas. 
Finley (1941) criticized Lashley’s 
earlier studies on the grounds that 
his lesions invaded more than one 
functional area of the cortex. She re- 
ported an experiment which was in- 
terpreted as showing that destruction 
of the visual cortex alone produces no 
more disturbance of maze perform- 
ance than does peripheral blinding. 
However, her lesions were for the 
most part small, involving a fraction 
of striate cortex, and her data can be 
harmonized with the known fact that 
small lesions anywhere in the cere- 
brum produce little or no disturbance 
of maze performance (Lashley, 1929). 

In a later study, Lashley (1943) 
reanalyzed the effects of accidental 
damage to extrastriate tissue. He 
concluded that “‘in no case is there a 
reliable difference between the scores 
of animals with lesions in a particular 
structure and those without. ... If 
the severe deterioration of the cases 
with destruction of the occipital cor- 
tex is to be ascribed to the damage to 
such nonvisual structures, then no 
one seems to contribute more than 
any other to the efficiency of per- 
formance” (p. 451). His analysis did 
not disclose any evidence of a cumu- 
lative effect of minor damage to a 
number of structures. 

Rollin’s (1955) recent report is sub- 
ject to the same criticism levelled at 
Finley, namely, that his lesions were 
in many cases too small. Lashley has 
emphasized repeatedly that lesions 
involving less than 10% of the rat 
cortex do not produce appreciable ef- 
fects on maze performance. Of 22 ex- 
perimental subjects, Rollin included 
six with posterior lesions involving 
4.5, 6.0, 8.4, 8.5, 8.6, and 8.8% of the 


cortex. It is not at all surprising to 


find that these rats made strikingly 
better scores than the rest of the 
group. The writer was able to select 
by inspection of Rollin’s anatomic 
chart no more than 10 rats which 
would be roughly acceptable on the 
basis of the following criteria: (a) at 
least 50% of striate cortex of each 
hemisphere was involved in the lesion 
and (b) at least 10% of the entire 
neocortex was damaged. These 10 
rats (24, 27, 42, 50, 60, 68, 81, 98, 155, 
and 161) made markedly inferior 
scores as compared with Rollin’s un- 
acceptably lesioned rats or his enu- 
cleated rats. A t test reveals that the 
difference between the dichotomized 
experimental group, the rats with ac- 
ceptable lesions compared to the rats 
with unacceptable lesions, is signifi- 
cant at the .01 level of confidence. 
Thus, a reanalysis of Rollin’s data 
renders his experiment indecisive and 
even supports Lashley’s position that 
striate cortex lesions have an adverse 
effect upon efficiency of maze per- 
formance. 

The evidence for monkey. In the 
monkey, effects of striate cortex le- 
sions have heretofore been studied 
with respect to vision only (Kluver, 
1941). Of special interest, however, is 
Kluver’s observation that compari- 
son activity in choosing between dis- 
criminanda (a kind of VTEing in the 
monkey) is impaired postoperatively. 
He states: ‘‘We have found . . . that 
the removal not only of the striate 
areas, but also of other cortical areas 
may lead, not to a loss of the previ- 
ously established differential response 
but merely to a temporary loss Or 


2 The above criteria are the writer’s own 
and obviously do not derive from the theory 
of mass action. Rollin’s criticism of Lash- 
ley’s reasoning leading to the conclusion that 
extravisual areas damage has no effect upon 
performance appears quite justified. 
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marked disturbance of the compari- 
son behavior” (Kluver, 1942, p. 263). 
Such an impairment might be ex- 
pected to appear following striate 
area lesions on nonvisual tasks as 
well. 

The experiments to be reported 
herein were undertaken to study the 
nonspecific effects of striate cortex le- 
sions in the rhesus monkey. Assum- 
ing continuity of cerebral organiza- 
tion from rat to monkey, and, in view 
of the pronounced postoperative 
maze disturbances in the rat (inter- 
preted as dementia by Tsang and loss 
of nonspecific facilitation by Lash- 
ley), it seemed reasonable to expect 
behavioral effects of a similar magni- 
tude in the monkey. The attempt to 
produce these disturbances in the 
monkey made it necessary to ana- 
lyze the functions involved in those 
tasks which were diagnostic of deficit. 
Solution of the maze and latch box 
appear to have the following aspects 
in common: 1. Their solution does 
not depend upon the integrity of any 
one sense modality. Instead, solu- 
tion appears to be governed by a mul- 
tiplicity of cues. To what extent the 
habits are sensorily and centrally 
controlled has never been satisfac- 
torily established. 2. A corollary is 
that the number of sensory and 
motor items to be associated appears 
in each case to be many. 3. Both the 
maze and latch box depend, for solu- 
tion, upon a general orientation 
in space and a motor coordination 
commensurate with this orientation. 
Both prescribe a definite path from 
starting box to goal. Thus, a “sense 
of direction” appears to be involved. 
4, Both require a relatively involved 
motor sequence. 

The battery of nonvisual tests se- 
lected included simple discrimina- 
tions (somesthetic and auditory), 
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tests of generalization and transfer 
(in somesthesis), and tests of ability to 
respond differentially according to 
context (delayed alternation and con- 
ditional reaction). These tasks were 


chosen on the basis of their proven 


value in disclosing behavioral dis- 
turbances following cerebral lesions 
in the monkey (Blum, 1951; Chow, 
1951). However, none of these tests 
was judged to be comparable in all 
respects to the enclosed maze in re- 
quirements for solution. A duplicate 
of Lashley’s maze was considered, 
but the idea was abandoned because 
the apparatus seemed poorly adapted 
to the manipulatory skills of the 
monkey. Instead, a stylus maze 
duplicating the pattern of the loco- 
motor maze used by Lashley (Maze 
3), was included in the battery (Fig. 
3). Four monkeys learned these 
tasks in total darkness to insure that 
response would be controlled by non- 
visual cues. Observation was possible 
with the use of a snooperscope, a de- 
vice which converts infrared light to 
visible light seen by the experimenter 
only (Orbach & Chow, 1959). Preop- 
erative retention scores were obtained 
and served as a base line to which 
postoperative retention scores were 
compared. 

The operation appeared to have no 
effect on the nonvisual behavior of 
any of the monkeys: postoperative 
and preoperative retention scores 
were quite comparable (Table 3). 
One monkey with a large lesion had 
difficulty in meeting the criterion on 
the various delays of the alternation 
problem and on the somesthetic ob- 
ject discrimination, but performance 
by the others was unimpaired. It is 
especially to be noted that the stylus 
maze habit was well retained by all 
four monkeys. Nothing resembling a 
transient loss of comparison behavior, 
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Fic. 3. STYLUS MAZE. The monkey was required to free the stylus without enter 
de sac in order to take food from it. 


described by Kluver in his occipital 
lobectomized monkeys, was noted 
even during the very first post- 
operative session. Lashley’s conten- 
tion that striate cortex exerts a facili- 
tative action on other regions of the 
brain received no support from these 
studies. 

The possibility that the use of a 
locomotor maze instead of a stylus 
maze might have affected the results 
and conclusions of the experiment 
prompted the writer to return to the 
idea of using a locomotor replica of 
Lashley’s Maze 3. An apparatus, 
suitable for monkeys, was con- 
structed. Five naive rhesus monkeys 
and one sophisticated one were se- 
lected as subjects. The procedure 
followed Lashley’s in every detail. 
The monkeys learned to traverse the 
maze errorlessly, after which they 
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ing the culs 
True path was 35 inches long. 


were peripherally blinded. They were 
enormously affected by the lack of 
vision but eventually reached the 
same pre-established Jevel of accura- 
cy. Following a two-week rest inter- 
val, retention of the habit was deter- 
mined (again used as base line to 
which postoperative retention was 
contrasted). Large occipital lesions 
were produced, involving virtually 
the entire striate cortex. Testing 
commenced two weeks after opera- 
tion in each case. 

The unilateral lesions appeared to 
have no effect on the maze habit, 
whereas bilateral occipital lesions 
(produced in one or two stages) had a 
consistently adverse effect on maze 
All monkeys made a 
r of errors and traverse 
more slowly after both 


the maze 5 A 
involved in the le- 


hemispheres were 
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TABLE 3 
PERCENTAGE SAVINGS IN RELEARNING AS COMPARED WITH INITIAL LEARNING 


Small Lesions Large Lesions 


Task Subject O1 Subject H2 Subject B3 Subject LC4 


Trials Errors Trials Errors Trials Errors Trials Errors 


Multiple cue somesthetic 100 100 100 100 100 100 100 100 


discrimination (98) (99) (92) (92) (92) (91) (62) (54) 
Somesthetic size discrimi- 100 100 80 82 100 100 100 100 
peti (92) (82) — — (100) (100) (—48) (4) 
Size transfer series 76 86 88 90 69 77 89 95 
Somesthetic form discrimi- 86 78 75 91 83 92 59 80 
nation = = = < — = (79) (91) 
Somesthetic object discrim- 100 100 33 67 0 —200 90 75 
ination (10) (43) (—89)(—100) (83) (50) (80) (50) 
Somesthetic object discrim- 99 99 33 36 95 93 98 94 > 
ination: first reversal (92) (90) (69) (71) (55) (71) (78) (84) 
Conditional reaction 100 100 63 58 96 91 95 95 
(88) (88) (50) (33) (77) (82) (76) (77) 
Reversals 97 20 92 70 
(89) (—20) (83) (55) 
Auditory localization 100 100 100 100 — — = = 
(96) (99) — — ee ee 
Simple alternation 100 100 — -> 87 88 100 100 
k (100) (100) — — (93) (88) = — 
Delayed alternation 
5 seconds 58 450 — — a a 33 37 
10 seconds 100 100 — — a a 100 100 
Stylus maze 96 99 97 99 98 99 79. 94 
Time 99 99 99 95 


Note.—Postoperative and, in parentheses, preoperative savings in relearning each task included in the battery 
by all four monkeys. 


a Indicates that initial learning scores were perfect. Percentage savings in relearning was therefore indeterminate. 
Only in the case of performance by Subject B3 on the delayed alternation was deficit suggested. 
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sion (Orbach, 1959). Disturbances 
were inno way as striking as those re- 
ported for the rat, but the inefficient 
maze performance was in marked 
contrast to the effective performance 
on the battery of nonvisual tasks and 
the stylus maze used previously. 
Confirmation of these facts was pro- 
vided by one of the maze-deteriorated 
monkeys who learned and was tested 
on a number of the nonvisual dis- 
crimination tasks, none of which suf- 
fered as a result of the operation. 

In summary, the data indicated 
the following: (a) Nonvisual discrimi- 
nations of varying degrees of com- 
plexity and stylus maze performance 
are not affected by pilateral striate 
cortex lesions (6) Locomotor maze 
performance suffers as a result of the 
removals. The monkeys do not ap- 
pear “demented,” nor have they 
completely lost the maze habit, since 
some show large savings in relearn- 
ing the maze. This result is to be 
contrasted with the more deteri- 
orated maze performance reported 
for the rat; and with the lack of 
effects on discrimination tasks of 
varying complexity in the monkey. 
(c) The differential effects summa- 
rized under (a) and (b) have been re- 
corded in a single monkey. 

The discrepancy between the data 
for locomotor and stylus mazes is 
puzzling. The requirements for solu- 
tion of these two tasks are apparently 


different and reflect different neural 
The locomotor maze 


mechanisms. 
position and, 


requires a shift in body 
in the absence of vision, prescribes 
the learning of a frame of reference 
for orientation. The stylus maze, On 
the other hand, permits the use of 
the body as a frame of reference. 
These data, in combination with the 
corresponding data for the rat, sug- 
gest that lesions affect nonvisual ori- 


entation involving a locomotor se- 
quence.’ 

There is no way of ruling out the 
possibility that the disturbance of 
locomotor maze performance is re- 
lated to the large peristriate cortex 
lesions and not at all (or only partly) 
to the removal of “primary visual” 
cortex. If this should eventually 
prove to be the case, it will provide 
additional evidence opposing the tra- 
ditional view that peristriate cortex 
constitutes 4 visual “association” 
area in the monkey. 

The evidence for man. To the infra- 
human experimental evidence can be 
added the scattered clinical reports 
of intellectual impairment following 
damage to the occipital lobes. These 
reports can only be regarded as sug- 
gestive because the anatomic data 
are often incomplete or lacking. Dis- 
turbances in “attention” 
emphasized by Poppelreuter 
Fuchs (1920), and Gelb and Gold- 
stein (1920) as a major symptom fol- 
hot wounds in the oc- 
cipital region. Kluver (1927) has re- 

he studies up to 1927 of vis- 


viewed t 
ual functioning following trauma to 


the occipital cortex. It is possible to 
reinterpret s i 

terms of “genera Some 
years ago, German and Fox (1934) 
published the case history of a young 
woman who was operated for the re- 
moval of a glioblastoma in the left oc- 
cipital lobe. Following excision of the 
lobe, which included Brodmann’s 
Areas 18 and 19 in addition to 17, 
the patient showed deficiency ina 

tests involving “vocal and manual 
reproduction of visual experiences. 

Visual recognition was unimpaired. 


ulty of the two mazes can- 


not be judged on the basis of the present data 
because the precise conditions of adaptation 


and testing were not equivalent. 


3 Relative diffic 
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In addition to the impairment of 
intellect as a sequel of cerebral le- 
sions, there are numerous case re- 
ports of disorientation in space. 
These disturbances vary from defec- 
tive localization of objects, confusion 
of right and left, misconception of 
place and time associated with con- 
fusional states, neglect of half the 
body and of external space, to 
disturbances in following habitual 
routes. The reader is referred to 
Semmes et al. (1955) for the clinical 
references and to Teuber (1955) for 
an evaluation of the effects of occipi- 
tal lesions in man. Teuber empha- 
sizes that vision in the presence of 
focal lesions in the central visual 
mechanism may be altered in a com- 
plex manner which is not revealed by 
the usual methods of plotting scoto- 
mata and giving acuity scores. His 
interpretation is dual and comple- 
mentary; first, that central lesions 
may produce a semblance of agnosia 
but this effect does not really require 
the assumption that higher aspects of 
vision are selectively impaired; and 
second, that lesions of the projection 
system transcend primary sensory 
processes, that receptive cortex may 
have more than a receptive function. 

There has been one recent attempt 
to apply a maze-like task in the study 
of the sequelae of brain damage in 
man (Semmes et al., 1955). The sub- 
jects were men who had sustained 
penetrating injuries of the brain in 
battle. They were given the task of 
locomoting along a path indicated on 
a map. The principal finding was 
that the performance of the group 
with parietal lobe lesions was signifi- 
cantly inferior to that of a control 
group with peripheral injury and to 
that of the brain injured group with- 
out parietal lesions. Some of the men 
in the latter group sustained occipital 


J. ORBACH 


injury. It is not possible to evaluate 
these results in relation to the prob- 
lem of nonvisual functioning of stri- 
ate cortex without precise informa- 
tion of the locus of the lesion. 

Some years ago, Levine (1952) re- 
ported an analysis of intellectual per- 
formance by a group of 20 soldiers 
suffering from trauma to the occipital 
lobes. Levine was able to compare 
posttrauma scores on the Wechsler- 
Bellevue with pretrauma scores on the 
AGCT. He interpreted the differ- 
ence in scores as indicating deteriora- 
tion on most subtests of the Wechs- 
ler-Bellevue. Largest deterioration 
scores appeared on tests of digit span 
and arithmetic. These results take on 


added si ‘ance when they are 
compare ~ ha less marked de- 
teriorati, o - 118 peripherally 
blindecoo) (83) he same study. 
Again re of this report 
is limi 6 uty tavi that the precise 
locus Xu extent of the cortical dam- 


age could not be estimated. In all 
such cases, nonstriate cortex was al- 
most certainly involved. 


“FUNCTION” AND THE LOGIC OF 
THE ABLATION METHOD 


There are certain difficulties in the 
use of the ablation method that im- 
pose limitations on the interpretation 
of the experimental results (Chow & 
Hutt, 1953) and the data of non- 
visual functioning of striate cortex 
reported here provide an excellent 
demonstration of this. Let us sup- 
pose we had available to us the data 
that nonvisual discrimination and 
stylus maze performance are not af- 
fected by the operation. Can we con- 
clude that striate cortex does not par- 
ticipate in the integration of non- 
visual behavior in normal monkeys 
or, for that matter, in operated mon- 
keys? Clearly, the locomotor maze 
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data do not support such a conclu- 
sion, Alternatively, let us suppose 
that we had available to us the data 
that nonvisual maze performance is 
affected by the operation. Can we 
conclude that striate cortex is the 
only area mediating successful per- 
formance on the maze? Even with 
histological verification of the lesion, 
the possibility that other areas have 
been rendered nonfunctional by the 
operation (trophic disturbances, dias- 
chisis, etc.) cannot be ruled out and 
so the results of the experiment might 
have no direct bearing on the func- 
tional organization of striate cortex. 
Moreover, the data presented here do 
not begin to suggest a mechanism of 
function of striate cortex. The neu- 
ronal circuitry involved can only be 
determined by applying methods 
other than gross ablation. Finally, 
the behavioral disturbance can be 
specified with any degree of accuracy 


only after a thorough analysis of the 


tests administered. 

The major shortcoming in the in- 
terpretation of testing data both in 
the clinic and laboratory is a lack of 
understanding of the requirements 
for solution of the tests administered. 
This was shown in the paradoxical in- 
consistency of effects on locomotor 
and stylus mazes. A preliminary at- 
tempt was made to resolve the in- 
consistency by specifying the differ- 
ences between the two tasks in re- 
quirements for solution: one task, 
the locomotor maze, demands the de- 
frame of reference for 


e other, the stylus 
maze, permits the use of the body as 
a frame of reference. The differences 
in “function” remain to be eluci- 
dated. The same points have been 
made elsewhere in relation to the de- 
layed response test and frontal lobe 
damage (Orbach & Fischer, 1959). 


velopment of a 
orientation; th 


Related to the lack of understand- 
ing of test requirements is the in- 
accurate and inconsistent use of the 
concept “function.” Consider once 
again the results that have been pre- 
sented and emphasized in this review, 
that nonvisual retention disturbances 
following removal of striate cortex 
were found to be pronounced in the 
rat but that attempts to demonstrate 
effects of a similar magnitude in the 
monkey ended either in failure (stylus 
maze) or only partial success (loco- 
motor maze). To what can the dis- 
crepancy of data between the two 
species be ascribed? 

1. Species differences in cortical or- 
ganization. In the rat, monkey, and 
man, the area striata appears to be 
topologically related to the lateral 
geniculate body and to the retina 
(Brouwer, 1927). There can be no 
doubt that this cortical area is spe- 
cialized to participate in vision. 
What does remain in doubt at pres- 

-ent is the degree of specialization. In 
the rat, specialization appears to be 
incomplete, as evidenced by the fact 
that learning ability and retention 
are impaired (as well as visual capac- 
ities) when the striate areas are de- 
stroyed. Specialization appears to be 
more nearly complete in the monkey. 

2. Quality of lesions. There has 
been some discussion in the literature 
whether different surgical techniques 
of cortical destruction can influence 
the outcome of the operation. Pick- 
ett (1952) produced some of his le- 
sions by thermocautery, others by 
aspiration. Results of the two meth- 
ods did not prove to be significantly 
different. Rosner (1953), in inter- 
preting the difference in performance 

between his and Lashley’s rats, SUS- 

gested that cautery may have effects: 
extending beyond the bounds of the 
primary Jesion. The differences 11 gi 
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fect between a clean and scar-form- 
ing lesion have been emphasized 
in the clinical literature (Penfield, 
1954). However, the comparable suc- 
cesses achieved in producing behav- 
joral disturbances in rat and monkey 
with the various surgical methods 
renders this possibility a remote one. 

3. Incomparability of tests. In view 
of the differential effects reported for 
locomotor and stylus mazes, this 
source of discrepancy must be em- 
phasized. Until our understanding 
of the test requirements is improved, 
Lashley has suggested the applica- 
tion of many diverse tests to mini- 
mize this defect. ‘‘Satisfactory inter- 
pretation of behavioral defects fol- 
lowing brain injuries cannot be de- 
rived from a few tests. Our concep- 
tions of the organization and ele- 
mentary functions of the brain are 
still inadequate for definition of gen- 
eral capacities which may be affected 
by injuries” (1948, p. 155).4 

The discrepancy between the data 
of laboratory and clinic has in fact 
been a thorn for neurological theory 
(Lashley, 1943; Penfield & Jasper, 
1954; Milner, 1954). The relation 
between size of cortical lesion and 
magnitude of behavioral disturbance, 
the empirical mass action relation- 
ship, is more commonly seen in the 
animal laboratory. Lashley’s (1942) 
solution has been to advocate a revi- 
sion of neurological theory, the re- 
duplicated trace theory. However, 
there are more parsimonious explana- 
tions of the mass action relationship 
(Hunter, 1930). Among the explana- 


4 Possible sources of the discrepancy not 
considered in this review include length of 
postoperative recovery (Lashley, 1948), 
amount of training or overtraining on the 
tests (Orbach & Fantz, 1958), and nature of 
pre- and postoperative experiences (Chow, 
1951). 
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tions is one that refers to the defini- 
tion of “function.” , 

In comparing the functions of stri- 
ate cortex, it is common to overlook 
the fact that visual and nonvisual 
functions are clearly unequal in 
breadth or inclusiveness. The conse- 
quence of this is to assign equal 
weight to these functions and to look 
for similar central mechanisms. To 
illustrate the errors in such a view, 
consider “earning” in relation to the 
ablation data. Acquisition of the 
multiple maze is dependent upon the 
integrity of the entire cerebral cor- 
tex (Lashley, 1929). For learning in- 
volving visual cues, it is possible to 
specify particular areas of the cere- 
brum which must be intact: occipi- 
tal, preoccipital, inferior temporal, 
and possibly frontal tissue (Lashley, 
1939, 1948; Orbach & Fantz, 1958; 
Orbach & Fischer, 1959). When it 
comes to learning of visual patterns, 
occipital and temporal lobe tissue ap- 
pear to be implicated to the exclusion 
of other tissue (Lashley, 1939; Or- 
bach & Fantz, 1958). Clearly, if a 
function is broadly enough defined, a 
sufficiently large lesion anywhere in 
the cerebrum will affect the function 
and the mass action relationship is 
apt to be obtained. These data lead 
us to conclude that there must be 
general functions (broadly defined) 
which are affected by lesions any- 
where in the cortex. Superimposed 
on these global functions are specific 
functions such as reception which are 
affected differentially by circum- 
scribed lesions. As a consequence, 
the neocortical lesion large enough to 
produce behavioral impairment has @ 
dual effect: a specific one depending 
primarily upon locus and a more gen- 
eral one depending primarily upon 
mass. The major problem in an ex- 
perimental program of localization 
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(or focalization) of function is to de- 
fine unitary functions. Whether fac- 
tor analysis as a tool can be em- 
ployed to advantage in the definition 
of function has yet to be demon- 
strated. The issue is not whether the 
mags action relationship holds, but 
rather whether the problems that 
have arisen from the mass action con- 
ceptualization, such as the nonvisual 
functioning of striate cortex, can be 
understood on the basis of current 
neurological theory. 

The view espoused here empha- 


sizes the importance of definition of 
“function” by testing procedure. It 
is not sufficiently accurate to speak 
of “learning” or “memory” in a gross 
sense. The results of ablation work 
have focused upon the necessity to 
specify the precise nature of the func- 
tion: learning of pattern discrimina- 
tion; immediate memory as tested by 
delayed response, etc. A major ad- 
vance in neuropsychological theory 
will come when we begin to under- 
stand the nature and significance of 
global and unitary functions. 
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Research in the area of attitude 
change often requires administration 
of a pretest to evaluate attitudes 
toward some topic. It has been sug- 
gested that attitudinal pretests may 
have a sensitizing effect so that a suc- 
ceeding experimental treatment is af- 
fected. To evaluate the effect of the 
treatment on attitude change, 4 post- 
test consisting of the identical ques- 
tionnaire used in the pretest is gen- 
erally administered following treat- 
ment. Consequently it is impossible 
to determine whether a significant 
change from pretest score to posttest 
score on a questionnaire is a direct 
result of the experimental treatment 
or of an interaction between pretest 
and treatment. 

In 1949, Solomon, noting that the 
use of a design utilizing a pretest fol- 
lowed by some experimental treat- 
ment and a posttest is prevalent in 
the psychological literature, pointed 
out the types of interpretations which 
could be made from results generate 
by such a design. Asignificant change 
from pretest to posttest score is at- 
tributable either to the experimental 
treatment alone or the interaction 
between the measuring instrument 
and the treatment. In order to dis- 
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criminate between these interpreta- 
tions, Solomon proposed that a de- 
sign utilizing an experimental group 
and three control groups be used for 
relevant types of studies. Such 
studies might include investigations 
on transfer of training, on attitude 
change, and on possible effects of ex- 
periments on responses, skills, and 
performance already existing in the 
behavioral repertoire. This ‘“‘four- 
group design” proposed by Solomon 
was taken as the basic design of this 
study. 

Campbell (1957) reviews six basic 
designs relevant to the validity of ex- 
periments in social settings, the most 
practicable and complete of which is 
the Solomon four-groupdesign, butas 
late as 1957 he lists no studies per- 
formed with this design as the model. 
Canter also has discussed the four- 
group design in application to human 
relations training. In examining the 
literature on attitude change from 
1950 to the present, one finds only a 
single empirical investigation (Piers, 
4955) using the four-group design as 
its model. The hypothesis that an in- 
teraction would occur was rejected. 
However, in this study the posttest 
employed a different scale from the 
one used in the pretest, and this fact 
might have minimized the possibility 
of such an interaction. Another inter- 
pretation, of course, is simply that 
the pretest was ineffective 1n influ- 
encing the effect of the experimental 
f 14 other studies of at- 

(Berrien, 1950; Crom- 
1952; Jarrett & 
Hey & Volhkert, 
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1952; Kelman, 1953; Lindgren, 1944; 
Maas, 1950; Osgood & ante 
1955; Plant, 1956; Sawyer, pee 
Thistlethwaite & Kamenetzky, 195 di 
Wagman, 1955; Weiss, 1953) whic! 
utilized a pre ee rearen DORTE 
design, all failed to control for the 
possibility of the interaction effect de- 
scribed above. } ; 

In addition to the interaction ques- 
tion another methodological problem 
is considered here. If we examine the 
characteristics of the latter half of 
our primary design we notice that 
most studies reported in the litera- 
ture are identical in this respect. 
That is, the treatment is followed im- 
mediately by the posttest, being sep- 
arated only by the time it takes the 
investigator to distribute the neces- 
sary materials. The time interval be- 
tween pretest and treatment, how- 
ever, generally varies from a few min- 
utes to several months. In order to 
examine the amount of attitude 

change retained over time, some in- 
vestigators have administered a sec- 
ond posttest at variable intervals af- 
ter the administration of the first, 
Because the purpose of the treatment 
in attitude change studies is usually 
transparent, e.g., films emphasizing 
racial tolerance, the posttest or pre- 
test administered in close conjunction 
with the treatment may produce 
highly artificial effects. The Ss are 
probably well aware when they are 
interrogated following a persuasive 
communication that the experiment- 
er expects some change in their atti- 
tudes. The present experiment, 
therefore, will examine the effect of 
the treatment on a posttest adminis- 
tered some time after the presenta- 
tion of this treatment. In order to 
reduce the artificiality in this type of 
design one may induce the S to dis- 
associate the three elements of the 
experiment. One way to accomplish 
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this is to separate these elements tem- 
porally. The daily experiences of the 
Ss may provide enough interference 
so that any association made by the 
S between treatment and posttest 
will be reduced to some extent. 

Another way to determine whether 
or not the purpose of the experiment 
is obvious to the S, and thereby af- 
fects his responses, is simply to ask 
him about his perceptions of the en- 
tire procedure, and if he believed that 
the investigator would return after 
having administered the question- 
naire for the first time. Data of this 
sort were obtained in this study by 
means of a standard interview to be 
described below. 

The following hypotheses were 
evaluated experimentally: 

1. The administration of a pretest 
in an attitude study will interact with 
the experimental treatment in such a 
manner as to influence significantly 
any resulting change in attitude, as 
measured by a posttest, from the at- 
titude of a comparable group receiv- 
ing the experimental treatment but 
not the pretest. That is, the adminis- 
tration of a pretest will sensitize an 
individual so that any change in atti- 
tude resulting from an experimental 
treatment consisting of persuasive 
material of some type will be signifi- 
cantly different from that of an indi- 
vidual not sensitized by a pretest, 
ceteris paribus. 

2. A delayed posttest administered 
some time after the experimental 
treatment will yield a significantly 
different mean score from the second 
or delayed posttest score of a control 
group that was first posttested im- 
mediately following treatment. 


PROCEDURE 
The Pilot Study 


A pilot study was conducted to 
provide information about the nature 
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their pretest attitudes toward vivi- 
section. The vivisection appeal was 
equally effective in changing opinion 
in both the 9-day, pretest-treatment 
interval group and the 1-month in- 
terval group as demonstrated by sig- 
nificant ¢ ratios of 3.45 and 3.37 be- 
tween pre- and posttests. The short- 
er interval was chosen for use in the 
final study because it appeared more 
likely that an interaction effect would 
be evident when the pretest and 
treatment were relatively close to one 
another in time. 

The corrected split-half reliability 
coefficient of the questionnaire was 
(85 which was considered sufficiently 


high for use of the instrument in the 


final study. 


Experiment 
One hundred and fifty-six stu- 
dents in five introductory psychology 
classes at the University of Maryland 
served as SS. The five groups were 
randomly assigned to five treatment 
conditions. Three of these groups re- 
ceived the pretest attitude question- 
One of the three groups lis- 
tened to the taped provivisection 
communication 12 days after taking 
the pretest- ‘After treatment, this 
group (Group I) was immediately 
posttested with the same question- 
naire and then posttested a secon 
time 12 days later: (The purpose of 
giving Group la second posttest 1 
days after the first was to provide a 
control for Group .) The other 
group (Group JV) was 


naire. 


out having be pretes d. Group 
answered uestionnaire once 
The design 1S mmarized below. 

In order for the pretest-treatmen 
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TABLE 1 
EXPERIMENTAL DESIGN 


Groups 
I II Il IV v 
Pretest Pretest 
pee 12 days 12 days 
Treatment Treatment - tics Treatment 
iti ttest osttest ‘osttes' 
Conditions tie 1 Posttes: {dows 
Posttest 2 Posttest 


variance must be significant at the 
.05 level. If this relationship did not 
hold there would be no means of as- 
sessing the differences between the 
mean posttest score of Group I, and 
the mean posttest score of Group II 
as attributable to an interaction ef- 
fect between the pretest and the 
treatment. The mean posttest score 
for Group III was compared with 
both pretest scores by use of a simple 
analysis of variance. Since this post- 
test mean represents the only testing 
of one of the groups it can be exam- 
ined in relation to the other pretests 
to further check on the comparability 
of the sample. 
Hypothesis IT was examined by ap- 
plying a ¢ test for independent means 
to the mean posttest score of Group 
V and the second mean posttest score 
of Group I. 


RESULTS 


To further support the contention 
that the Ss had similar attitudes con- 
cerning vivisection, the results of the 
experiment were also examined for 
the presence of this effect. Bartlett’s 
test was applied to the pretest scores 
of Groups I and IV and the posttest 
scores of Group III, these three sets 
of scores and the pretest results of 
Group V, and the five pretests of the 
pilot study and the three of the final 
study combined. All chi squares were 


not significant at the .05 level. The 
conclusion is drawn that the vari- 
ances of all these groups were homo- 
geneous. An analysis of variance was 
then performed on the mean pretest 
scores of Groups I, IV, V, and the 
posttest of Group III, and a second 
analysis was applied on the eight pre- 
test means of the combined groups 
from the pilot and final studies. The 
resulting F ratios were not significant. 
This represents a rather rigorous 
demonstration that for the college 
students represented in the sample, 
opinions about vivisection were ini- 
tially homogeneous. 

With the comparability of the 
groups assured, the analysis of the 
final results was undertaken without 
the use of any untenable assumptions 
as to the homogeneity of the sample. 
Bartlett's test was applied to the 
posttest scores of the final study. The 
groups were judged homogeneous 
with respect to variance since the re- 
sulting chi square was not significant 
at the .05 level. The factorial anal- 
ysis of variance technique was then 
performed on the posttest means. 
Means and standard deviations were 
calculated for the four groups and are 
reported below. 

The results of the analysis of vari- 
ance on the posttest means for Groups 
4 through IV are Presented in Table 
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TABLE 2 


SUMMARY OF MEANS AND STANDARD DE- 
VIATIONS OF POSTTEST SCORES 


TABLE 4 


Suamrary oF ¿ Tests USED FOR 
EXAMINATION OF HYPOTHESIS II 


: Groups M SD N Comparisons t p 
(Pretest ne com 42.96 9.06 26 ane 
a om- oe . A 
A pre- and posttest 1.88 <.05 
Group I 
(N Il post- and delayed posttest <i >205 
No pretest and 42.90 5.34 50 
communication) Group I 
u pretest and Group V pretest <1 >.05 
(No pretestandno 40.77 5.76 48 Group I delayed posttest and 
communication) Group V posttest <1 >.05 
p IV Group V 
(Pretest and no 40.28 5.48 32 pre- and posttest 2.31 <.05 


communication) 


It is evident that the only signifi- 
cant Fis that for the treatment effect, 
while the F ratios for the effects of 
the pretest and the interaction be- 
tween the pretest and the treatment 
are both less than one. The fact that 
the vivisection communication is suc- 
cessful in changing opinion in a posi- 
tive direction implies that the condi- 
tions necessary for testing the inter- 
action effect of pretesting and_treat- 
ment are present. The F ratio test- 
ing this effect was not significant. It 
can be concluded that the pretest 
and treatment did not interact, and 
that the pretest did not sensitize the 
Ss to the communication. The main 
effect of pretesting was not signifi- 
cant, which is to be expected since 


Note.—? values were determined by use of a one- 
tailed table of the £ distribution. 


the treatment effect was significant. 

To determine whether the varia- 
bility of the pretest mean scores was 
different from that of the posttest 
mean scores appropriate F ratios 
were examined and found to be insig- 
nificant. 

For testing Hypothesis II, a t test 
was performed between the mean 
posttest score of Group V and the 
second mean posttest score of Group 
J. However, before this test could be 
run, several other ż tests had to be 
calculated to determine the compara- 
bility of the two groups. A summary 
of these ¢ tests appears in Table 4. 
Since there is no significant difference 
between the mean posttest scores of 
Group I (delayed posttest score), and 


TABLE 3 
ANALYSIS OF VARIANCE Summary PostrEst MEANS 
Source SS df MS F 
Treatment (R) 5.78 1 5.78 5.35 p<.05 
Pretest (C) ‘05 1 05 <2 
RxC 07 1 07 <1 
Error 152 1.08 
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Group V, the conclusion may be 
drawn that the time interval between 
treatment and posttest, within cer- 
tain limits, is not effective in influenc- 
ing the posttest magnitude. 


DISCUSSION 


Since the F ratio representing the 
interaction effect of pretesting and 
treatment was insignificant, it can be 
concluded that the act of pretesting 
a group of Ss with a questionnaire 
does not influence their subsequent 
reactions to a persuasive appeal in 
terms of attitude change toward the 
topic under examination. However, 
because the results have been nega- 
tive in this study, generalizations to 
other types of attitudes are limited. 
Conceivably, different kinds of atti- 
tudes, such as those toward ethnic 
groups, toward political candidates, 
etc., might interact with a relevant 
pretest questionnaire in such a way 
as to produce a significant interaction 
effect. The examination of possible 
pretest-treatment interaction effects 
in the type of design suggested by 
Solomon (1949) and others (Jahoda, 
Deutsch, & Cook, 1951) utilizing dif- 
ferent attitudes is necessary before an 
inclusive generalization concerning 
the effects of pretesting on attitude 
change can legitimately be made. 
Solomon (1949) has demonstrated 
this interaction effect using a learn- 
ing situation rather than an attitudi- 
nal one; perhaps pretesting acts as a 
sensitizer in situations where the tak- 
ing of a test acts as a mediator for the 
recall of previously learned material, 
for example, remembering how to 
spell certain words, or recalling cer- 
tain attitudes which have been lying 
latent. In the case of the vivisection 
propaganda utilized in this study, 
there were probably only relatively 
mild attitudes existing that could be 
recalled even when the pretest was 


administered. Perhaps the greater 
the controversiality of a given issue, 
the more likely it is that strong atti- 
tudes will exist and be recalled, and 
the more likely that their subsequent 
modification or strengthening during 
experimental treatment will give rise 
to an interaction effect. It is sug- 
gested that the four-group design ap- 
plied in this study should be used in 
similar attitude change studies at 
least until enough information is ac- 
cumulated about the interaction ef- 
fects of pretests on various types of 
attitudes to understand better the 
use of control in this type of situa- 
tion. 

At the completion of the experi- 
ment, the classes that had received 
pretests were revisited and asked 
three questions concerning their reac- 
tions toward the experimental pro- 
cedure. The three groups involved 
were I, IV, and V. The questions 
asked these groups were as follows: 

1, When answering the questionnaire for the 
first time, how many of you felt, at that 


time, that I would be back to examine you 
in some way later on? ` 


2. Of those who thought I would be back, how 
many of you believed that I would be back 


to re-examine you with the same question- 
naire? 


3. When responding to the questionnaire for 
the second time, how many of you believed 
that I was looking for a change in your 
attitude because of the tape you listened 
to? (The comment concerning the tape was 


omitted from the question with control 
Group IV.) 


In all three groups the number of Ss 
answering ‘‘yes” to Question 3 con- 
stitutes a large percentage of the total 
group participating in the experi- 
ment. Apparently, college students 
are quite aware of the experimenter’s 
attempt to change attitudes about a 
given topic when they are handed a 
questionnaire for the second time. 
However, in terms of the results of 
this experiment, this perception was 
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ee ene in changing their 
eh ifferentially from the group 
Ad no received only the treatment 
on ce oo Question 1, concern- 
BE pat 's suspicions about the in- 
ARE a s return after the pretest 
yal re ration, was answered affirma- 
eins any one individual in both 
et an V, and by 13 Ss in 
fee ee ae hy 50% of Group I and 
Should % of the other two groups 
Pe stent ave responded as they did is 
Ghectio ng, However, when asked in 
they Pi a whether they believed 
ae u have to repeat taking the 
ee ee only two indi- 
Pa G rom Group I and one each 
Aati ii IV and V answered af- 
aa ee y. The responses to Ques- 
ee Pike predominantly afirma- 
out seem to indicate that the Ss 
ee aware of the purpose of the 
Ea Denter but, in this case at 
this a probably not affected by 
PSiaion owledge in registering their 
This s on the questionnaire. 
Pek a, contention is borne out by the 
did AE the Ss in the control group 
Ei, hange their attitudes toward 
Úevd Pe even though they be- 
ROL y the experimenter was 
hone or change in their second 
Hes Bee questionnaire. 

inten, cial as it may seem to ad- 
Ihe nic a posttest immediately after 
ee ey of a treatment, the 
ieee apparently makes little 
Ehanves nee influencing the attitude 
riba of college students. The con- 
giving gore to Group V involved 
Biplivad e posttest 12 days after the 
are ion of the treatment so as to 
ae ns any influence which might 
een generated by immediate 


posttesting. The change score was 
significant for this group and the 
posttest score was not different from 
either the first or second posttest 
scores of Group I, which was used as 
the control group. Even though the 
subjects perceived the motivations of 
the investigator they were apparently 
not affected by this knowledge as 
evidenced by the fact that the con- 
trol group, in spite of their awareness 
of the intent of the experimenter, did 
not modify their attitudes in the ab- 
sence of a persuasive communication. 


SUMMARY 


This study was designed to investi- 
gatea possible interaction effect ofan 
attitudinal pretest and a persua- 
sive communication in the widely 
used lipretest-treatment-posttest” at- 
titude change research design. The 
existence of such an interaction ef- 
fect would make doubtful the current 
interpretations of the results of stud- 
jes utilizing this model without the 
proper controls. A four-minute re- 
corded provivisection talk and a 10- 
item vivisection questionnaire were 
used as the treatment and measuring 
instrument. 

It is concluded that: 

1. The administration of a pretest 
in a prestest-treatment-posttes” re- 
search design in attitudinal studies 


rily sensitize an indi- 


vidual so that his reaction to 4 given 


affect the amount of attitude change 


as compared to the administration © 
the posttest some time after the ap- 
plication of the treatment. 
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A current problem in personnel 
assessment is that of evaluating the 
accuracy of clinical in contrast with 
actuarial prediction. Meehl’s recent 
monograph (1954) sets forth several 
theoretical issues and empirical find- 
ings. The present paper is concerned 
with techniques for the quantitative 
comparison of the two types of pre- 
diction. For brevity, these types will 
be referred to below as CP and AP, 
clinical prediction and actuarial pre- 
diction, respectively. 

In this paper, the predictions, both 
AP and CP, and likewise the cri- 
terion, are categorical. This is com- 
mon practice. Moreover, the result- 
ing classificatory data (cf., €-g-, Ta- 
bles 2, 3, and 4) naturally entail use 
of nonparametric methods, a fact 
Permitting the desirable characteris- 
tic of robustness of inference (Box & 
Anderson, 1955). 


ILLUSTRATIVE DATA 


The data presented to illustrate the 
methods are extracts from a lengthy 
study by Apostolakos (1957), which 
rar be consulted for substantive 
detail. The sample (a cross-valida- 
fet group) consisted of 188 college 
ae hman students. The criterion to 
> predicted by the AP and the CP 
( as membership in one of five classes 
Selina prelaw, etc.). The in- 
ARER ora available for prediction 
Olde oe of eight predictors (the 
Test a Nate 3 Psychological 
ete.). e Cooperative English Test, 
eo. AP, the predictors were 

a statistically (Apostolakos, 


1957) bya lineardiscriminant function 
analysis (Rao, 1952) leading to Table 
21 For the CP, the eight predictors 
were utilized by the clinicians on the 
basis of their professional psycholog- 
ical experience, giving rise to tables 
such as Tables 3 and 4. 

Thus, in Table 3, first row, of the 
40 individuals in Class 1, Clinician 1 
classified 30 into Class 1, 2 into Class 
2,0 into Class 3, 1 into Class 4, and 7 
into Class 5. Similarly, from the first 
column, it is seen that Clinician 1 
classified as members of Class 1, 30 
students who were in fact in Class 1, 
10 who were in Class 2, 5 who were in 
Class 4, and 24 who were in Class 5 
The main diagonal (from upper left 
to lower right) indicates that Clini- 
cian 1 classified the following number 
of individuals from each of the groups 
correctly: 30 correctly classified from 
Class 1, 16 from Class 2, 36 from 
Class 3, 30 from Class 4, and 10 cor- 
rectly classified from Class 5. Evi- 
dently the total number of correct 
classifications is the sum of the diag- 
onal entries, a total of 122 (out of 188 
possible) for Clinician 1. Tables 2 
and 4 are to be similarly interpreted. 

Fundamental problems arising in 
the examination of these data in- 
clude: validity analysis of the AP; 
validity analysis of the CP; differen- 
tial validity analysis among the AP 


and CP; and reliability analyses of 
the CP. These problems will be con- 


sidered in turn. 


encies in parentheses will be dis- 


1 The frequ l 
i 7 in the Validity Analysis of the 


cussed below 


AP, ete. 
301 
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VALIDITY ANALYSIS OF THE AP 


This analysis, directed at Table 2, 
has two parts. First, the existence of 
significant (i.e., nonzero) validity of 
the AP for the criterion is examined. 
Second, if a significant relationship 
does exist, then the magnitude of the 
relationship, i.e., a validity measure, 
is estimated. 


Test of Significance 


The conventional chi square con- 
tingency table analysis is not correct 
here because the null hypothesis is: 
The number of individuals correctly 
classified by the AP does not differ from 
the number who would be correctly 
classified if the assignment of the indi- 
viduals to the five classes were based on 
chance alone. Hence it is clear that 
significance by the usual Pearson chi- 
square test provides only a necessary 
but not a sufficient demonstration of 
better than chance performance (Lu- 
bin, 1950). 

The frequencies expected on the 


assumption of the null hypothesis are 
given by: 


chance frequency in the ith row and 
7 ni nj 
jth column= 7 E 
n 
where 


ni= marginal total of 
n;= marginal total of 
umn; 


n=total number of individuals, 
For example, 


(40) (30) _ $ 
188 


the ith row; 
the jth col- 


is the chance frequency in the first 
row and first column of Table 2, and 


1 


TABLE 1 


CORRECT AND INcorRECT ACTUARIAL 
PREDICTIONS AND CLINICAL (CPi, 
CLINICIAN 1) PREDICTIONS 


INS Correct Incorrect otal 
Predictions | Predictions 

CP, 
Correct 
Predictions 81 41 122 
Incorrect 
Predictions 6 60 66 
Total 87 101 188 


is enclosed in parentheses beside the 
observed frequency, 13, in this table. 
For the purpose at hand, only the di- 
agonal theoretical frequencies, indi- 
cated by parentheses, in Tables 203; 
and 4, are required. 

The appropriate test of this null 
hypothesis of zero validity is due to 
Stevens (1938), and is conveniently 
termed the “matching problem tech- 
nique,” Recent discussions bearing 
on Stevens’ basic memoir are by 
Mosteller and Bush (1954) and Gil- 
bert (1956), 

Stevens’ test consists of the calcu- 
lation of a critical ratio: 


O-e 


SE, 


which, under the null hypothesis, has 
a normal distribution to a close ap- 
proximation (Anderson, 1943), In 
this test criterion, 
O=observed number of correct 
classifications; 
€=expected number of correct 
classifications, on the 
hypothesis; 
Ee =the standard 
given by the 


null 


deviation of e, 
formula; 


SE. = 


— = 


n= 1) 10> nini)? =n} [uinnin | nn;)} 


CLINICAL VS. ACTUARIAL PREDICTIONS 


303 


TABLE 2 
Actua J 
ACTUAL CRITERION STATUS AND ACTUARIALLY PREDICTED (AP) CRITERION STATUS 


AP 
Classif- 
cation 
Criterion Class 1 Class2 Class3 Class4 Class 5 Total 
Classifi- 
cation 
ges i 13(7.45) 3 2 4 H 76 
Class 3 6 4(2.09 5 9 4 a 
Class 4 5 3 16(5.53) 8 3 AS 
Class 5 4 5 6 21(9.57) 4 40 
2 2 2 1 33(14.47) | 40 
n 
otal 30 17 zi ng be fe 


a e present example, from the 
the n diagonal of T able 2, the sum of 
DE apolar: not in parentheses gives 
ae , while the total of the frequen- 
X s in parentheses yields e=39.11. 
Application of the above formula for 
St e results in SE,=5.42. Hence 
evens’ test criterion is 


87—39.11 
5.42 


= 8.84 


pers the .001 level of significance, 
Pee number of correct actuarial 
Wire is significantly greater 
pri the numberof correct predictions 
nee through chance. (cf. Gil- 
aa for the case of small » and 
ap utions other than the normal 
Proximation.) 


Estimation of Degree of Validity 


5 ee data in classifica- 
effi Harare s contingency C0- 
as with is in common use.” However, 
square i orina contingency chi- 
ea A DA this index is not ap- 

ere. A simple measure, 


2 
Co, 
mpare, however, the Goodman-Kruskal 


index ) 
lowe Under Reliability Analysis of a CP, be- 


readily interpretable, is the observed 
probability, say P, of correct classifica- 
tion, i.e., 0/1. 

Here, p=46%. 

‘Also of definite utility is a measure, 
supplementary to p, consisting of the 
gain in correct classification over 
chance. Thus, the expected probabil- 
ity of correct classification by chance 
alone, is ¢/7, which is here 21%; the 
gain, then, in using the AP is 25%. 


VALIDITY ANALYSIS OF A CR 


The analysis is precisely as above 
for the AP. For example, referring to 
Table 3, Clinician 1 operated above 
the chance level since O=122, € 
=38.60 and the Stevens’ test yields a 
normal deviate of 15.43, highly sig- 
nificant. Moreover, p=65%, the 
gain over chance being 44%. Again, 
e.g., from Table 4, Clinician 10 ob- 
tained 140 correct classifications, sig- 
nificantly greater than the chance ex- 
pectancy of 37.96 since the Stevens’ 
deviate is 18.62. Also p=T4%, the 
gain over chance being 54%- 


DIFFERENTIAL VALIDITY ANALYSIS 
AMONG THE CP AND 
ient to divide this an- 


It is conven 
s: first, the com- 


alysis into three part 
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TABLE 3 


ACTUAL CRITERION STATUS AND CLINICALLY PREDICTED (CPi, 
CLINICIAN 1) CRITERION Status 


GP 
Classifi- 
= Class 1 Class 2 Class 3 Class 4 Class 5 Total 
Criterion 
Classifi- 
cation 
40 
30(14.68) 2 0 1 7 
ie 10 16(3.28) 0 0 2 28 
ae 3 0 0 36(8.72) 0 4 40 
ae 4 5 2 1 30(6.60) 2 40 
E 5 24 2 4 0 10(5.32) 40 
Total 69 22 41 31 25 188 


parison of a single specified CP with 
the AP; second, all of the CP jointly 
compared with the AP; third, the 
CP compared among themselves, 


Comparison of a CP With the AP 


From the foregoing, both the dis- 
criminant function Predictions, i.e., 
the AP, and, e.g., Clinician 1’s pre- 
dictions, CP,, are significantly better 
than chance. The question now 


tion compare with Clinician 1, i.e., is 
there a significant difference between 
the AP and CP? 

A contingency table, such as Ta- 
bles 2, 3, 4, could be constructed with 
the AP Predictions as one factor of 
classification and the CP, as the 
other, However, such a table does 
not indicate validity directly, i.e., 
which predictions are in fact correct 


> classifications. From an examination 
arises: how does the discriminant func- of Tables 2 and 3, it appears that one 
TABLE 4 


ACTUAL CRITERION Stratus 
CLINICIAN 10 


AND CLINICALLY PREDICTED (CPio 


) CRITERION STATUS 
CP 
Classifi- 
cation 
Aa Class 1 Class2 Class 3 Class4 Class 5 Total 
Criterion 
Classifi- 
cation 
Class 1 25(7.02) 3 0 1 11 40 
Class 2 1 24(4.77) 0 0 3 28 
Class 3 0 28 (6.38) 2 10 40 
Class 4 0 0 0 38 (8.94) 2 40 
Class § 7 5 2 1 25(10.85) 40 
Total 33 32 30 42 51 188 


CLINICAL VS. ACTUARIAL PREDICTIONS 


actually wishes to test the signifi- 
i of the difference between the 
at er of correct classifications, 122, 
ade by CP, and the number, 87 
made by the AP. eo 
o erie the relevant data may 
: p rrente in a 2X2 contingency 
E e Dih the two rows represent 
B jects who have been (a) cor- 
CP y and (b) incorrectly classified by 
cad and in which the two columns 
(ap sent the subjects who have been 
a) correctly and (b) incorrectly clas- 
sified by the AP. 
ae appropriate analysis, it is 
i a note that the data are 
meet shar the same subjects being 
bees ne oth CP: and AP. Hence the 
a S nia Shi square test is incor- 
A Ae occa’ (1955) test for 
ke A a frequencies being required. 
= an example of the method, con- 
er Table 1 in which 


(41—6)? 
41+6 


cA = is an observed chi square with 1 
— eo of freedom. In this applica- 
ae there is thus a significant differ- 
rea een validity beyond the 1% level, 
ee of CP, over AP. (Ct. Mc- 
"aa eg | for instances in which a 
quired.) or continuity” 1s re 
a we technique may also be 
Clinici, compare Clinician 1 with 
N a Z7 Other comparisons are 
Fai i However, the foregoing anal- 
Sete n imited to a contrast of two 
sidered predictions. Next to be con- 
analvei is a generalization to the joint 
nalysis of three or more sets of pre- 
Ictions. 


Joi ; 
an e parison of a Set of CP With 


Cock (1950) has extended the 
emar test for correlated fre- 


305 


quencies to situations involving more 
than two agents of prediction. 

For example, in the present case, 
consider an arrangement of the data 
in a table with 11 columns, corre- 
sponding to the 10 CP and the AP, 
and with 188 rows, corresponding to 
the individuals being classified. In 
the cells of this table, a correct classi- 
fication is indicated by al, an incor- 
rect classification by a 0. Therefore, 
the sum of the column headed “AP” 
represents the total frequency of cor- 
rect classifications made by the AP. 
The sum of a row represents the fre- 
quency with which the individual 
corresponding to that row has been 
correctly classified by the 10 CP and 
the AP. 

For the null hypothesis that the 
probability of correct classification is 
the same for all 11 CP and the AP, 
Cochran (1950) has shown that an 
appropriate test criterion is 


i=l 


which possesses an approximate chi 
square distribution with c—1 degrees 
of freedom if 7 is not too small, where 
C;=total frequency of correct clas- 
sification in the jth column; 
T=the mean frequency of correct 
column classification; 
R;=total frequency of correct clas- 
sification in the ith row; 
c=number of columns; 
y=number of rows. r 
However, the inference of direct 
relevance here is not the appraisal of 
the omnibus null hypothesis’ but 
ter of interest, o the ge 
= = the omnibus tesi cri- 
case, (eIl Gn ch square with 10 de- 
grees of freedom, indicating that, over-all, the 


3 Just asa mat 
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TABLE 5 
Crassiry-RECLAssIFY RELIABILITY DATA FOR CLINICIAN 1 


Classifi- 
cation 
1 


Class 1 Class 2 


Class 3 Class 4 Class 5 Total 


Classifi- 
cation 
2 

Class 1 55 2 $ T 
Class 2 2 16, : ? i p 
Class 3 0 9 30 y A y 
AA 2 2 1 31 1 37 
Class 5 10 9 9 R 3 
Total 69 22 41 31 25 188 


rather is a test of the following null 
hypothesis: The average performance 
of the 10 CP does not differ from that of 
the AP. By the usual rules for parti- 
tioning a sum of squares, 2=(Cj—C)? 
may be decomposed into compon- 
ents. Multiplication by 


c(¢c—1) 


= eee 
F r 

e( >) Ri) -D R? 
i=l i=l 

then converts each such component 
into a Q chi Square. In the present 
case, the comparison of the frequency 
of correct predictions made by the 
discriminant function, AP, with the 
frequency made by the Clinicians asa 
group, the CP, yields a chi square of 
53.84 with 1 degree of freedom. This 
indicates that the frequency of cor- 
rect predictions made by the AP, in 
this application, was significantly less 


differences in frequency of correct Predictions 
among the 10 CP and the AP are significant 
beyond the 1% level. To find the denominator 
of Q, one would not, of course, actually con- 
struct a table of 188 rows, but rather form a 
separate frequency distribution of the dis-. 
tinct values of the row totals (Cf. Cochran, 
1950). 


beyond the 1% level than the fre- 
quency of correct Predictions made 
by the CP asa group. 


Comparison Among the CP 


ed the impor- 
variation in 
set of clini- 


r r hypothesis 
of interest here, viz., the 10 CP are 


t beyond the 1% 
oncluded that the 


significantly in the 
number of correct predictions. 


Further comparisons among par- 


ticular groups of CP may, of course, 
be made. 


RELIABILITY ANALYSIS of A CP 


As an example of the problem, con- 
sider Table 5, in which the result of 
a second set of predictions by Clini- 
cian 1 are arranged with the initial 
set into a “classify-reclassify” set up. 


This is analogous to the usual “test- 
retest” reliability situation. 
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Unlike Table 3, the ordinary con- 
tingency chi square analysis may be 
correctly applied to Table 5 to test 
the null hypothesis of no relationship 
between the two sets of classifica- 
tions, the ‘‘matching problem tech- 
nique” being irrelevant here. For 
Table 5, the chi square value is 
476.76 with (5 —1)(5—1) =16 degrees 
of freedom, indicating a significant 
degree of consistency.* 

Having established the existence of 
classify-reclassily reliability, a meas- 
ure of the degree of such relationship 
is customary. As an improvement in 
ease of interpretation over Pearson’s 
contingency coefficient, the result of 
recent work by Goodman and Krus- 
kal (1954) (in elaboration of earlier 
suggestions by Guttman [1941]), is 
suggested. The rationale is based on 
the following considerations: If asked 
to guess the status in, for example, 
the second classification of a random- 
ly chosen individual, one would guess 
the modal class (or, if there is a tie, a 
random choice among the ties). 
However, insofar as there is good re- 
liability between the first and second 
classifications, then, if the first clas- 
sification information is available, an 
improved guess as to second classifi- 


cation status would be the class the 
individual had been assigned by the 
first classification. (Since symmetry 
is desired in the reliability concept, 
“first” and “second” may be inter- 
changed above.) 

The Goodman-Kruskal index, A, 
has a direct operational definition, 
viz., \=the relative decrease in prob- 
ability of erroneous guesses of the 
status, in either classification, of ran- 
domly chosen individuals, as one 
goes from the information on one- 
classification-only situation to infor- 
mation on both classifications. 

In the present case, the estimate of 
` is .75. This is computed as 


2[55-+16+-36-+31+19]— (63469) 
2(188) — (63469) 


i.e., the numerator is 


2(sum of the diagonal values) —(sum 
of the marginal frequencies of the 
two modal classes) 


and the denominator is 


2n—(sum of the marginal frequencies 
of the two modal classes). The relia- 
bility of the AP is, of course, perfect, 
so that an analysis such as the forego- 
ing is unnecessary. 
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A CRITIQUE AND EXPERIM 


ENTAL DESIGN FOR THE 


STUDY OF THE RELATIONSHIP BETWEEN 
PRODUCTIVITY AND JOB SATISFACTION 


HARRY C. TRIANDIS 
University of Illinois 


Industrial psychologists have often 
assumed that a relationship exists be- 
tween output and job satisfaction. 
The satisfied worker, they reasoned, 
must also produce more than the dis- 
satished. However, in many cases 
where they tried to substantiate this 
hypothesis the relationship between 
the two variables was found to be by 
no means clear. 

Brayfield and Crockett (1955), af- 
ter reviewing the literature, con- 
cluded that there is little evidence of 
a relationship between morale sur- 
veys and performance. A sample of 
the publications in this area reveals 
that some studies obtained positive, 
some negative, and some no relation- 
ship between the two variables. For 
instance, Bernberg (1952) with work- 
ers in an aircraft factory and Kristy 
(1952) with post office workers, found 
no relationship between job satisfac- 
tion and the efficiency of individuals. 
Heron (1952) found a positive corre- 
lation of .31 between job satisfaction 
and a general index of satisfactoriness 
for 144 bus conductors. Katz and 
Kahn (1951) found a negative rela- 
tion between job satisfaction and rat- 
ings of output for work-teams on the 
railroad; Katz, Maccoby, and Morse 
(1950) found no relationship for 
groups of clerical workers, Giese and 
Ruter (1949) found a positive corre- 
lation of .19 for departments in a 
retail firm. Halpin (1954) found 
that the “employee-centered” air- 
crew commanders he studied had con- 
tented but inefficient crews- Bavelas 
(1953) and Leavitt (1951) found that 
the communication pattern on an OF 


ganization may be an influential de- 
terminant of satisfaction and output; 
patterns of communication which 
permitted tight organization evolved 
more quickly, were more stable, and 
more efficient, yet at the same time 
the morale was low. The evidence is 
at least enough to refute the widely 
held belief that satisfied workers and 
high output necessarily go together. 
Some of the confusion which is 
caused by the above mentioned con- 
tradictory results may be eliminated 
if we undertake a logical analysis of 
the relationship between productiv- 
ity and job satisfaction. Figure 1 is 
the result of such a logical analysis. 
third variable, namely pressure for 
high production (P) is required for this 
analysis. The value of P increases 
monotonically from Point A through 
Points C, D, E, J, and is maximum at 
G The output-job satisfaction curve 
of Fig. 1 is of course hypothetical. 
However, it indicates several inter- 
esting features. The values of x and 
y are in standard score form.! Point A 
is the condition of maximum satisfac- 
tion; it is a condition found only in 
Utopia, where the worker “gets out” 
a great deal—in pay, satisfaction, 
pleasure of making use of his abilities, 
etc.—and ‘‘puts in” very little—min- 
imum effort, fatigue, risks of acci- 
dent, time, injury to health, dissat- 


pany the tim 


ficiently consis e 
e taken to do the job over the 


ratio of the tim Tt 
as a measure of productivity. 


standard time 4 , 
Satisfaction scores may be obtained from a 
standardized attitude scale. 
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Fic. 1. Tue Ovteur-Jop SATISFACTION Curve 


Note: x is output 


+y is positive job satisfaction 
=y is negative job satisfaction 


restriction of 
movement, etc. Condition A involves 
the minimum Pressure for high pro- 


faction. If pressures towards greater 
production are present, we, move 
along the curve, through Points H 
and B to Point C. Here job satisfac- 
tion is still high, and production is 
very high. From Point C on, how- 
ever, any increase in the pressure to 
produce will increase dissatisfaction, 
so that production will decrease. 
Point D represents a condition where 
the worker is indifferent about his 


work, and Produces an amount x, 
that permits him to “get by: aig 
pressures increase, say he is under 
threat of firing, he ma 
than at Point D 


er extreme pressure 
threat for his life, 


RELATIONSHIP BETWEEN PRODUCTIVITY AND JOB SATISFACTION 3a 


It is a matter of speculation as to 
where along the production-job satis- 
faction curve the various studies 
summarized in the second paragraph 
of this paper may actually be lo- 
cated. It is probable that Bernberg 
and Kristy worked close to Point D, 
Heron somewhere between D and C, 
Katz between D and E in the rail- 
way study, and around D in the cler- 
ical study. The Halpin study was 
probably located around H or B, and 
finally, in the Bavelas and Leavitt 
studies the circular and chain pat- 
terns of communication created con- 
ditions similar to those at H, while 
the Y and wheel patterns of organiza- 
tion were more typical of N. 

At any rate, the foregoing analysis 
suggests that the current methods of 
studying the relationship between 
output and job satisfaction are in- 
adequate. Positive, negative, or no 
findings are equally likely. What is 
needed is a systematic exploration of 
the values of x and y under a variety 
of conditions. At this point, how- 
ever, certain value judgments are un- 
avoidable. The most obvious value 
judgment involves the question of 
what combination of x and y is most 
desirable. The present author takes 
the position that the ideal condition 
is one where the workers are most 
satisfied and output is highest at the 
same time? If this is granted, then 
we may wish to study under what 
conditions 3=f(% y) isa maximum. 
If we assume that ¥, the productivity, 
and y, the job satisfaction, are in 
standard score form we can proceed 
rather simply. It is well known (e.g. 


2 Here considerations of 4 nonscientific na- 
ture were brought into the picture. A value 
judgment was necessary and it was assumed 
that the current humanistic philosophic posi- 
tion, which is widely held in the U.S.A., would 
require that both the well-being of the enter- 
prise (maximizing productivity) and the well- 
being of its employees (maximizing of job 
satisfaction) be considered. 


Smith, Salkover, & Justice, 1947) 
that the necessary conditions for 
maximizing z are the following: 

03 0s 

—=0 and —=0. E 

Ox ay 


The sufficient conditions are 


ðz? ds” ð? N? 
wa (2o m 


ax? dy? axdy 
ðs? ðz? 
Ž <0 and —<0. [3] 
Ox? ay? 


At this point it is necessary to assume 
the form of z=f(*, y). Let us assume 
that: 


g=alnxt+b In ytery [4] 


where a, b, and c are positive con- 
stants. The form of the function is 
arbitrary; it was chosen because it 
has maxima and because it makes 3 
more dependent on xy? 

If we compute the required deriva- 
tives we obtain: 


ðs a 
Z= cy [5] 
ôx v 
ðs b 
E A [6] 
oy EY. 
hed a 
SE ee 7 
Ox? x | l 
Ls b 
a [sl 
oy? y? 
and 
0's 
=6 19] 
ðxðy 


Equations [7] and [8] satisfy Equa- 
tions [3] if a and b are positive. 

3 Again considerations of humanistic ethics 
were used as guides, as described in Footnote 
2. American values tend to approve of a bal- 
ancing of opposing forces, and it is considered 
that maximizing the influence of xy would be 
consistent with such values. 
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If we satisfy Equations 1 and 
solve simultaneously Equations 5 and 
6 we obtain a=b. Equation 2 leads to 


ab 
xy 


—c>0 
and we know that the maximum is at 
point (xı, yı) when 

ab 


ay? yn? 


If a=b=k we have 


a y= — or 44-1 = +— [10] 
c ¢ 

It is well known that the xı: yı prod- 
uct in an equation such as [10] is 
maximized when yı=y;, Hence, we 
can conclude that z will be maximum 
if x=y. In Fig. 1 this condition is 
represented by Point M. 

The procedure for the study of the 
relationship between output and job 
satisfaction, then, is as follows: First, 
we must obtain the norms for work- 
er output and worker job satisfaction 
for the particular type of work which 
we are to study. Second, we must 
show each worker in the sample on a 
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graph such as the one of Fig. 1. This 
will permit us to obtain a group of 
workers who, for our purposes, can 
be considered to be operating near 
Point M (see Fig. 1). Third, we must 
compare the workers who are operat- 
ing under conditions M with the 
workers who are operating under all 
other conditions. In this comparison 
we will be interested in studying per- 
sonality variables, group structure 
variables, differences in the type of 
supervision, training of the super- 
visor, and whatever other variables 


are of interest to the industrial psy- 
chologist. 


SUMMARY 


It is argued that the present ap- 
proach to the study of the -elation- 
ship between employee output and 
job satisfaction is not fruitful. What 
is needed is an examination of the 
characteristics of workers who are op- 
erating ata satisfactory level of both 
output and job satisfaction with 
workers and groups of workers who 
are not operating at such a level. A 
Procedure is described which will per- 
mit the location of the workers who 
are Operating at this “optimal” level. 
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LEARNING THEORY AND SCHIZOPHRENIA: A COMMENT 


RICHARD pe MILLE 
University of Southern California 


Taking generalization as his pivo- 
tal concept, Mednick (1958) has 
worked out a learning-theory expla- 
nation of aspects of schizophrenia. 
The progress of a patient from an in- 
cipient stage, with great anxiety, toa 
chronic stage, with little or no anxi- 
ety, is described in terms of a com- 
plex, mutually causal relationship 
between generalization and anxiety. 
In the incipient stage, anxiety, as 
drive, increases stimulus generaliza- 
tion. In turn, stimulus generali- 
zation increases anxiety by render- 
ing formerly nonthreatening stimuli 
threatening. This increase of anxiety 
brings about a further increase of 
stimulus generalization, and soon. A 
transitional stage follows in which 
anxiety level stabilizes and then di- 
minishes, owing to a new effect of 
generalization. Whereas in the incipi- 
ent stage the effect of stimulus gen- 
eralization is to make additional cues 
capable of provoking the anxious re- 
sponse, in the transitional stage the 
effect of generalization is to bring 
about “remote, irrelevant, tangen- 
tial” (Mednick, 1958, pP. 324) 
thoughts which compete with poten- 
tial anxiety-arousing cues, prevent 
the anxious response, and reduce 
drive. In the chronic stage, these 
“irrelevant” thoughts, having been 
reinforced many times by tension re- 
duction, are the most probable re- 
sponses in the patient’s repertory. 
Thus a condition of low anxiety is 
reached and is maintained through a 
general avoidance reaction. 

As this theory stands, it contains 
an elusive but important flaw: the 
ability of something called variously 


“generalization” and ‘“‘stimulus gen- 
eralization” to have two quite oppo- 
site effects, in the absence of any 
formal statements justifying such 
versatility. In the interests of good 
theory, the concept of stimulus gen- 
eralization should remain the same 
throughout, or should be qualified 
formally. If it remains the same, then 
no “irrelevant” thoughts can occur to 
the schizophrenic to reduce his anx- 
iety, because stimulus generalization 
will render all thoughts relevant and 
capable of provoking the anxious re- 
sponse. On the other hand, it is obvi- 
ous that response generalization is 
what the author is talking about 
when he speaks of “a highly general- 
ized, remote, irrelevant, tangential 
associate” (P. 324). The concept 
of response generalization, likewise, 
should remain the same throughout 
or should be qualified formally. If it 
remains the same, anxiety will not in- 
crease in the incipient stage, because 
competing responses will occur in 
sufficient number to nullify both anx- 
iety-producing thoughts and the anx- 
jous response per se. 

In the context of this theory, the 
behavioral data require an arrange- 
ment in which stimulus generaliza- 
tion is the more effective process 
when anxiety is moderate, and re- 
sponse generalization the more effec- 
tive process when anxiety is extreme. 
This can only be achieved by the pos- 
tulation of two distinct and different 
functional relationships for stimulus 
generalization and response generali- 
zation as functions of drive. 

Without formal specification of the 
conditions under which cited events 
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will occur, concepts take on animistic 
qualities and act in self-determined 
and unpredictable ways. The present 
case is analogous to a discussion of, 
say, the recall of completed and un- 
completed tasks in which it is not 
stated under what conditions uncom- 
pleted tasks will predominate and un- 
der what conditions, completed tasks. 
The investigator begins with a hy- 
pothesis that there is a tendency to 
complete the uncompleted tasks and 
that the higher the motivation level 
of the subject, the more uncompleted 
tasks will be recalled. He knows, 
however, of experiments in which 
subjects recall more completed tasks, 
In order to account for all the data he 
extends his hypothesis thus: there is 
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a tendency to complete uncompleted 
tasks which increases with increased 
motivation until a point is reached at 
which the tendency is so strong that 
completed tasks begin to predomi- 
nate in recall. This hypothesis has 
much in common with the hypothesis 
that generalization and anxiety are 
reciprocally augmenting until a point 
is reached at which generalization is 
so wide that it reduces anxiety. In 
either case, a remedy may be sought 
in further theoretical articulation. 
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A REPLY TO A COMMENT 
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_The paper under discussion (Med- 
nick, 1958) suggests that chronicity 
in schizophrenia is brought on by the 
repeated occurrence of remote associ- 
ative responses in the context of anx- 
iety provoking thoughts. These re- 
mote associates are anxiety reducing 
since they remove the anxiety pro- 
voking “thoughts from awareness. 
The mechanism used to explain the 
introduction of these remote associ- 
ates is increased drive leading to in- 
creased generalization. Mr. de Mille 
(1959) makes two points: 

1. If generalization is the agent 
which elicits the remote associate 
then “no ‘irrelevant’ thoughts can oc- 
cur to the schizophrenic, to reduce his 
anxiety, because stimulus generaliza- 
tion will render all thoughts relevant 
and capable of provoking the anxious 
response.” It is therefore implied 
that no drive reduction (and conse- 
quently no learning of remote re- 
sponses) could occur. 

Stimulus generalization may in- 
deed tend to “render all thoughts 
relevant,” but not necessarily equally 
relevant. Mr. de Mille’s point is well 
taken if you assume a generalization 
gradient with zero slope. However, 
if the generalization gradient does 
*show the usual drop off from its high 
point then the net effect of the pos- 
ited associative transition must be a 
reduction in drive. The amount of 
drive reduction will vary with the 
degree of associative similarity be- 
tween the original anxiety provoking 
thought and the remote associate, the 
slope of the gradient and the level of 
response to the original anxiety pro- 
voking thought. 


In addition there is a mechanism 
for the production of the remote as- 
sociate which minimizes the role of 
generalization. Under conditions of 
high drive some remote associations 
(which under conditions of low drive 
would be below the evocation thresh- 
old) may rise above the evocation 
threshold and may be emitted. These 
responses need not be related to the 
original anxiety provoking thought. 
They will, however, interfere with the 
anxiety provoking thought and be | 
followed by drive reduction. 

9. A related criticism that Mr. de 
Mille offers concerns the explanation 
of the incipient phase of the illness. 
It was proposed that anxiety and 
stimulus generalization could recipro- 
cally augment each other and that 
this process could produce patholog- 
ically high levels of both anxiety and 
generalization. However, Mr. de 
Mille argues that «anxiety will 
not increase in the incipient stage be- 
cause competing responses will occur 
in sufficient number to nullify both 
anxiety-producing thoughts and the 
anxious response per se.” That is, the 
anxiety level will remain relatively 
low, since “competing responses will 
occur in sufficient number” to keep 
the anxiety level from ever rising at 
all. He does not indicate the source 
of his prediction regarding the num- 
ber of competing responses that will 
occur nor how many competing re- 
sponses will be “sufficient.” 

However, from recent theoretical 
analyses of competing response situa- 
tions (Hill, 1957; Taylor, 1956), it is 
clear that appreciable numbers 0 
competing remote associative Te- 
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sponses will only occur at very high 
levels of drive. As Hill has indicated, 
it is very unlikely that remote associ- 
ates will intrude in great number on 
an ongoing thought process, except 
under conditions of very high drive. 
Thus, it is in turn unlikely that the 
competing remote associative re- 
sponses will occur to any important 
degree early enough to “nullify both 
anxiety-producing thoughts and the 
anxious response per se.” 

What is more likely is that the re- 
mote associates will occur in impor- 
tant numbers only after the recipro- 
cal augmentation spiral has carried 
the anxiety level to some abnormal 
level. Further, in an individual suf- 
fering his first acute breakdown, 
these avoidant associates (associates 
which function like avoidance re- 
sponses) will occur only fleetingly, 
They will be followed by momentary 
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periods of drive reduction and thus 
be reinforced. It is only repeated 
breakdowns of this nature that offer 
the conditions for learning of impor- 
tant numbers of these avoidant asso- 
ciates. It would seem that one way of 
defining degree of chronicity would 
be degree of learning of these avoid- 
ant associates, 

Mr. de Mille objects to the inter- 
changing of the terms “stimulus gen- 
eralization” and “generalization” in 
my paper (1958), Perhaps for the 
sake of clarity all uses of the term 
“generalization” in a section labeled: 
Stimulus Generalization should be read 
“stimulus generalization,” 
out the remainder of this paper, all 
uses of the term “generalization” 
should be read “stin 


l nulus generaliza- 
ton and/or mediated, associative 
generalization,” 


Through- 
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The main purpose of this paper is 


7 examine the work on the structure 


f groups or teams. The emphasis 
ation structure, 


will be on communic 
an aspect of group behavior that has 
received much attention in the recent 
experimental and theoretical litera- 
ture. The term “structure” refers 
here to a relationship in a group, €-8- 
communicates to.” 

The following questions initiated 
this survey: 

1. How can the interactions or 
communications of a group, its struc- 
tural characteristics, be measured? 
_ 2. How are structural character- 
istics related to group performance? 

Several areas contribute answers. 
The areas include sociometry, the 
mathematical techniques growing out 
of sociometry, and the group network 
studies. The contribution of mathe- 
matical techniques in answering the 


first question will be considered here. 


Some answers to the second question 


1 Prepared under Contract N7 onr-37008, 
NR-154-079, between the ‘American Institute 
for Research and the Office of Naval Research, 
Psychological Sciences Division, Personnel and 
Training Branch, as part of a research project 
on team training and performance. he 
antoa wish to thank Ardie Lubin and R. 

uncan Luce for their helpful reviews of the 
paper. 

a Now at Walter Reed Army Institute of 
search, 
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will be considered in a subsequent 
paper. 


SocloMETRIC AND RELATED 
TECHNIQUES 

There has been a considerable de- 
velopment of techniques for the 
description and analysis of the rela- 
tionships between group members. 
The initial stimulus for this work 
came from the area of sociometry. 
Two major changes have, however, 
occurred in these techniques since 


their inception. 

1. Originally, they were concerned 
solely with the pattern or structure 
of likes and dislikes within a group. 
They soon developed, however, to in- 
clude any pattern of relation. 

2. Originally, they were primarily 
graphical. Now they include the use 
of mathematical techniques. 

Examples of the original approach 
can be found in the work of Moreno 
(1934) and Jennings (1950). The so- 
ciogram or graphic presentation © 
the relationships in a group will not 
be considered here. Moreover, since 
many of the indices developed out of 
earlier sociometric work are closely 
tied to the use of the sociogram, and 
also because they have been sum- 
marized in other surveys (Lindzey & 
Borgatta, 1954; Proctor & Loomis, 

1951), the emphasis here will be on 
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the mathematical approaches. The 
survey by Proctor and Loomis (1951) 
gives a full treatment of the work on 
indices and statistical analysis that 
stems directly from the sociometric 
tradition. 
Work on the techniques develop- 
ing out of sociometry can be cate- 
gorized into the following areas: (a) 
construction of indices for group and 
individual characteristics; (b) enu- 
meration of structures; (c) comparison 
of groups; (d) analysis of subgroups; 
(e) assignment of individuals to sub- 
groups; (f) other approaches: graph 
theory, logic of relations, Although 
in most cases the techniques were de- 
vised for the relationships of choice 
and rejection, they can be applied di- 
rectly to relationships such as “inter- 
acts with” or “communicates with.” 
One of the aims of this survey is to 
set forth the available techniques and 
to indicate their possible application, 
Most of the techniques discussed 
here have not been applied exten- 
sively. Their practical and theoreti- 
cal usefulness Cannot, therefore, be 
definitively evaluated at this point, 
The first major step in the mathe- 
matical treatment of sociometric ma- 
terial was noting that it could be 
cast in the form of matrices, and the 
operations of matrix algebra applied 


toit. This step was made b Forsyth 
and Katz (1946). G Bene 


the cell entry. Ifa given relationship 
exists between individuals i and j 
(e.g., í speaks to j) then 4;0. If not 
then a;;=0. The treatment of the 
diagonal, arr, depends on the pur- 
poses of the investigator. In most 
cases, zeroes will be entered in the 
diagonal. 

The simplest type of matrix to 
summarize the relations within a 
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group is one that simply notes 
whether or not the given relationship 
exists between a pair of individuals. 
In this case ai;=1 or 0. For ex- 
ample, if in a four-man group it is 
found that @ speaks to (or likes) c, b 
does the same to a,c, and d; c tob 
and d; and d to a and b; the matrix 
would appear as in Table iA 


TABLE 1 
MATRIX WITH BINARY ENTRIES 


Receiver 
a c d 
E — 0 1 0 
3 b 1 — 1 1 
F c 0 1 — 1 
d 1 1 0 — 


In some cases, the investigator is 
dealing with a relationship that 
varies in strength or frequency. For 
example, it may be that b speaks fre- 
quently to a, infrequently to c, and 
frequently to d, etc. In this case, the 
matrix might appear as in Table 2. 

e positive cell entries have been 


TABLE 2 
Matrix witg WEIGHTED ENTRIES 


a b ç d 
G = 0 1 0 
b 2 — 1 2 
c 0 1 — 2 
d 2 2 0 = 


Weighted accordin 
ere a;;>0 if 
tween i and J 
In other 


§ to frequency. 
the relation exists be- 


Stoj. The sign of 
en be used to indi- 
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cate whether the individual accepts 
(positive) or rejects (negative) an- 
other individual. The absolute size of 
the entry is used to indicate strength 
of the response. 

Suppose, for example, that in a 
three-man group a rejects ¢ and 
chooses b; b strongly chooses ¢ and ¢ 
strongly rejects both a and b. The 
oe would then appear as in Table 

TABLE 3 
MATRIX WITH WEIGHTED BIPOLAR ENTRIES 


a b c 
a — 1 -1 
b 0 — 2 
È =2 =) = 


Katz (1947) has indicated some of 
the possibilities opened up by the use 
of matrices. These include tech- 
niques for writing equations describ- 
ing over-all changes in group struc- 
ture. As will be seen below, matrices 
also permit many types of complex 
analysis by relatively simple opera- 
tions. 


CONSTRUCTION OF INDICES FOR 
GROUP AND INDIVIDUAL 
CHARACTERISTICS 


Sociometry has been prolific in the 
construction of indices. The mean- 
ing of these indices is usually fairly 
clear and their computation rela- 
tively simple. A typical index is one 
cited (Proctor & Loomis 1951) as a 
measure of group cohesion. The in- 
dex is the number of mutually chosen 
pairs divided by 2!/(n—2)! 2!, the 
number of ways that a pair of indi- 
viduals can be drawn from a group of 
n individuals. Since indices of this 
pee have been presented in detail by 
aoctor and Loomis (1951), only 

hose not discussed elsewhere are con- 
Sidered here. 


Group Indices 


Several indices of group charac- 
teristics have been suggested for the 
extent to which the group is centered 
on a small number of individuals. 
These indices are based on the vari- 
ance of the column sums of the choice 
matrix. One of these is the index 
of concentration suggested by Katz 
(1954). 

Hohn (1953) has developed an in- 
dex based on the ratio between ob- 
tained variance of column sums (s) 
and the maximum possible variance 
for the special case of the matrix with 
weighted entries in which each indi- 
vidual ranks all other members of the 
group in order of preference. The 
index is called the hierarchy index 
with the following formula. 


12 
h=—— ma 
n(n?—1)(n—2)° 


j n3(n—1)? 
(2e [1] 


Landau (1951a) earlier developed a 
similar hierarchy index for another 
special case, that of dominance rela- 


tions. 


Individual Indices 


Much sociometric work is con- 
cerned with specifying characteristics 
of the individual in the group. For 
example, the total number of choices 
received and made by an individual 
is used to indicate his “popularity” 
and his ‘‘outgoingness”’ respectively. 
If Table 1 were a matrix of com- 
munications, b would be the most 
productive of output and receive the 
same amount of input as the other 
members. 

The simplest technique in describ- 
ing the individual is to use the sum of 
the rows and columns as above. This 
technique, however, makes it difficult 
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to compare individuals in groups of 
different sizes (Criswell, 1950; Ed- 
wards, 1948). Various refinements 
have therefore been introduced; such 
as, weighting the sum for each indi- 
vidual by the maximum number it is 
possible for him to send or receive. 
Another difficulty with the use of a 
simple row or column index is that it 
does not take into account indirect as 
well as direct connections between 
individuals. There may be an im- 
portant difference between the indi- 
vidual who is chosen by three group 
members who are themselves chosen 
by many others and the individual 
who is chosen by three “isolates.” 


Status Index 


Katz (1953) has developed a status 
index that takes account of such in- 
direct links. He makes use of the fact 
that in a matrix with binary entries, 
as in Table 1, the powers of the 
matrix give the number of indirect 
connections to each member of the 
group. Thus, if the matrix in Table 1 
has zeroes placed in the diagonal and 
is squared, that is, multiplied by it- 
self, the matrix in Table 4 is ob- 
tained. The entries in the squared 
matrix indicate the number of two- 
link connections between each pair of 
group members. For example, there 
is one two-link connection between 
a and b (ac), none between a 
and c, and two between c and a (cb 
—a and c—d-a). Cubing the orig- 
inal matrix would give a matrix all 
of whose entries are positive, indicat- 
ing that every member of the group 


TABLE 4 
Matrix or Two-Link Connections 


=ne 
omne 
Neno 
mmm ya 


has one or more three-link connec- 
tions with every other group member. 
The status index, T, for each indi- 
vidual may be computed as the total 
of all direct and indirect links to the 
individual. These may be obtained 
from the column sums of the original 
matrix, plus those of all the powers 
of the matrix. One might consider, 
therefore, the column sums of 


T=A+A*4 434 PE +A 3 8 
=(I-4y=] [2] 
where A is the origin 
matrix with zero 
and J is the ident: 
Katz suggests, further, that in- 
direct links be weighted inversely to 
the number of links involved. To do 
this a constant c is employed with 
0<c<1. The following formula then 


gives the matrix of summed and 
weighted values: 


al sociometric 
es in the diagonal 
ity matrix. 


T=cA+cA24 .., HAr oe. 
=(I-cA)-1~7 [3] 


Katz derives an equivalent formula 
for the computation of status that 
finds the solution through a set of 
linear equations rather than by find- 


ing the inverse of a matrix explicitly. 


rajas i 


is the transpose of A, and 
umn vector whose entries 
lumn sums of A. The for- 
mula yields a set of linear equations 
which may be solved for t, the sums of 
the columns of T, Katz also presents 
a formula for the Weighting of ¢ ac- 
cording to the number of possible 
choices in the group. 
Leontief Matrices 


Hubbell, 
with the in 


where A’ 
S is a col 
are the co 


by working explicitly 
verse matrix, develops 


l 
TA AEN E 


GROUP STRUCTURE AND BEHAVIOR 


much more extensive information 
from an approach similar to Katz. 
Hubbell points out that matrices 
summarizing relationships within a 
group have been used by Leontieff in 
his input-output or interindustry 
models. He therefore transfers the 
Leontieff techniques to the socio- 
matrix.® The techniques are aimed at 
tracing the long-run effect of each 
member upon the others. In order to 
do this, use is made, as Katz does 
above, of the fact that (I—A)~ gives 
the sum of all the powers of A. 

An example of the approach pre- 
sented by Hubbell is the following: 
Suppose the matrix in Table 1 sum- 
marizes the relationships in a group. 
If the positive entries in the matrix 
are all changed to .25, this is the same 
as multiplying the matrix‘ by Katz’ 
constant, c=.25. The inverse of 
I—cA is given in Table 5. The in- 
verse matrix can be taken to sum- 
marize the eventual effect of each 
member on any other. Thus, one 
unit of activity on the part of b will 
eventually give rise to .40 units by a. 
The row sums indicate the long-run 
influence of each member. In this 
case, the row sum for b is the largest 
and that for a is the smallest. The 
column sums, here equal, may be 
taken to signify the amount of con- 
straint or pressure put on each mem- 
ber in the long run. 

Up to this point, the work is the 
same as that outlined by Katz (1953) 
with one major difference. The com- 


plete matrix of long-run influence is 


_ *“Sociomatrix” will be used here and below 
Instead of the longer “sociometric matrix.” 

å This weighting could be rationalized by 
assuming that each member of the group 
gives only half his attention to what goes on 
within the group and that he distributes his 
attention equally among the inputs from other 
eroup members. The column sums are all, 
; herefore, taken to equal .50, and the values 
n the cells changed accordingly. 
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TABLE 5 
Matrix oF (I—cA)4 


1.05 09 -28 -09 
-40 1.20 -40 -40 
19 -38 1.14 38 
-36 32 AT 1.12 


obtained. (Other differences such as 
not subtracting the identity matrix 
from the inverse matrix are unim- 
portant.) The use of the inverse 
matrix is significant for two reasons. 
First, it becomes easier to change the 
analysis from row sums to column 
sums. Second, it becomes possible to 
estimate the effect of various distri- 
butions of input from the external en- 
vironment. Thus, if each member re- 
ceives one unit of external input 
which he relays to members of his 
group, the eventual effect of each 
member can be computed by post- 
multiplying (I—¢cA)* by a column 
vector of ones.’ The column vector 
obtained is (1.51, 2.40, 2.09, 1.97) in- 
dicating that b is most influential. 
The effect of differences in the 
amount of external input can be sim- 
ilarly computed by postmultiplying 
with the appropriate column vector 
containing a different distribution of 
inputs. 

It is also possible to assign differ- 
ential weights to the likelihood of in- 
fluence (or communication) travel- 
ling along a particular channel. For 
example, it may be that ¢ pays more 
attention to a than to b. In this 
case, values in cells ais and az might 
be .42 and .08, respectively, instead 
of .25. When the inverse of I minus 
this new matrix is computed, it is 
found that the effects of the change 
are to increase the long-run influence 
of a and d and decrease that of b 


and c. 
5 This simply obtains the row sums dis- 
cussed in the preceding paragraph. 
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This technique is therefore of spe- 
cial interest in analyzing groups with 
respect to the interrelation of the 
members and the possible effects 
of changes in demands, pressures, or 
environmental contacts. It may be 
possible to pick out cases in which 
the imposition upon group members 
of a given distribution of demands 
from the environment so overloads 
some members of the group as to 
cause a breakdown. Changes in the 
distribution of demands or the struc- 


ture of the group may obviate this 
overloading. 


ENUMERATION OF STR UCTURES 


Work has been done in counting 
the number of graphs or matrices 
that display specified characteristics 
such as a given set of row and column 
totals. The work on enumeration is 
important in obtaining chance dis- 
tributions of various types of struc- 
tures. The work cited in this section 


concerns only matrices with binary 
entries. 


Number of Matrices with Given Row 
and Column Sums 


Katz and Powell (1954) attack the 
following enumeration problem: 
Given a matrix with certain row and 
column totals, how many distinct 
matrices with the same set of row and 
column totals in the same or. 
be generated? 

Solutions have been worked out 
and tables constructed by Sukhatme 
(1938), and by David and Kendall 
(1951), for matrices with binary en- 
tries. In the case of the sociomatrix, 

a restriction is usually introduced, 
The entries on the diagonal are either 
all zeroes or all ones. Katz has de- 
veloped a formula that permits the 
use of the tables for these special 
cases. 

The number of matrices (n) with 


der may 


zeroes in the diagonal cells, and a 
fixed set of column sums (s) and row 
sums (r) is given by 


n 
a(s, =A { II +8)“, n} [5] 
i=l 

The operator 6; reduces entry i of 
the vectors of row and column sums 
by one. A is the number of all 
matrices, unrestricted with respect 
to the diagonal, that can generate 
the given set of column and row sums. 

The formula for 7(s, r) is easy to 
apply since powers of 6 have to be 
considered only up to m; where m; is 
the smaller of (s; r,). This is ex- 
panded and the individual terms are 
evaluated using the tables mentioned 
above to evaluate the various A’s ob- 
tained through the formula. 


Number of Distinct Structures 


A different enumeration problem 
has been attacked by Davis (1953). 
Given a set of n elements, how many 
distinct structures of relationship are 
possible between them? Two struc- 
tures are distinct if they are not per- 
mutations of each other. In terms of 
matrices, two matrices are distinct if 
one cannot be obtained from the 
other by a simultaneous permutation 
of rows and columns. Davis develops 
formulae not only for the counting of 
the number of distinct structures, but 
also for the counting of specific kinds 
of relation structure, e.g., reflexive, 
symmetric, asymmetric, etc. The 
bounds for the number of structures 
are given as follows: 


Qn*—1og lee? < number of distinct 
structures<2” [6] 
Davis, in a later Paper (1954), ap- 
ë Davis’ paper concerns n-adic as well as 


dyadic structures. Only the dyadic structures 


are considered here. See also Copi d H: 
EN so Copi and Harary 


_——— EE 
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plied his formulae for the counting of 
structures to the special case of 
dominance relations. 


Distribution of Subgroup 
Configurations 


Closely related to the enumeration 
of group structures is work on the 
distribution of various subgroup 
structures. A large number of papers 
deal with the expected number and 
the distributions of various configura- 
tions on the basis of chance. This 
work has been done for stars (wheels), 
chains, rings (circles), various types 
of cliques and isolates. Much of this 
work has been carried forward by 
Katz (1952a; Katz & Olkin, 1952; 
Katz & Powell, 1957). It has im- 
portance in evaluating the results of 
the analysis of configurations. For 
example, if a given group has three 
isolates, it is of interest to discover 
how often this could have arisen by 
chance. However, @ psychological 
theory or rationale to dictate the 
choice of configurations for study and 
to indicate why there should be de- 
partures from chance ordering has 
not yet been developed. 

If and when theory and experi- 
mental investigation develop in which 
group structure is a dependent varia- 
ble, then the distribution of particu- 
lar configurations will have consider- 
able importance. At the present time, 
however, configurations and their 
distribution seem to have little the- 
oretical or practical significance. 


Comparison OF GROUPS 


The question of the similarity of 
two matrices is important for either 
the evaluation of the amount of 
change in a group or the departure of 
the actual pattern from an ideal or re- 
quired pattern. The techniques pre- 
sented thus far for measuring sim- 
ilarity are basically correlational. 


They involve pairing of correspond- 
ing entries in two matrices and cor- 
relating the values found. 


Comparison of Matrices Using Cell 
Entries 

When a matrix has binary entries 
there are only four ordered pairs pos- 
sible for the corresponding cells: (0, 
0), (1, 0), (0, 1), and (1, 1). Katz and 
Powell (1953) therefore use a four- 
fold table to summarize the agree- 
ments and disagreements of entries in 
two binary matrices. They then con- 
struct an index of conformity based 
upon the observed cell frequencies: 


f 1 
î=—— ne 1)nas—nan] [7] 
nang 


where n(n—1) =number of off-diag- 
onal cells, na = number of positive en- 
tries in matrix A, np =number of 
positive entries in B, ng= number of 
zero entries in B and nag =number of 
cells positive in both A and B. The 
index ranges between i and —(1z/75) 
and equals 0 when A and B are inde- 
pendent. The Î for the agreement of 
A with B and B with A is not neces- 
sarily the same. For situations in 
which it is not assumed that one of 
the matrices is antecedent, the geo- 
metric mean of the two. possible in- 
dices may be computed. The index is 
called C, a coefficient of concord- 


ance. 
—<. n(n—1)nan— ans 
C= Va e Diesel 
„/nangnBna 


It ranges between +1 and —1. 

Katz and Powell point out that the 
same approach can be used to com- 
pare an individual’s choices within 
matrices 4 and B (both describing, 


of course, the same group). A four- 


fold table is constructed on the basis 


of two corresponding rows of the ma- 
trices and the same indices computed. 
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Comparison of Matrices Using Row or 
Column Sums 


Another type of comparison of 
matrices can be made by focusing on 
the row of column sums. This is 
easily accomplished by computing a 
product-moment correlation coeffi- 
cient for the paired row sums or the 
paired column sums. ; ; 

Hohn (1953) has discussed in de- 
tail the comparison of sociomatrices 
for the special case in which each 
individual ranks all others and zeroes 
are inserted in the diagonal. For this 
case the product-moment correlation 
between column sums may be written 

as follows: 


inner product, divided by the product 
of their magnitudes. 


D aib; 
i 
COS Îag = 


When a and b are deviations from 
their respective means, this is the 
standard correlation coefficient. To 
make use of either, it is necessary to 
assign values to the diagonal entries 
in the matrix. (It would probably be 
best to assign the highest possible 
positive cell value to all cells in the 
diagonal.) It is then possible to 
measure the agreement between the 


[10] 


nè(n—1)? 


DD sis ———__— 


9= i 


4 


erm 


emma 


where s; and s; are the column sums 
for the ith column of two matrices, 

Similar correlations could be ob- 
tained for row sums or column sums 
in the case of any type of socio- 
matrix, i.e., matrices with any type 
of weighted entries. In these cases, 
however, the general formula for the 
product-moment Correlation coeffi- 
cient should be used. 


Generalization of Correlational 


Approach 


Katz (1947) has outlined a general 
method for evaluating agreement be- 
tween the patterns of individua] 
choices. The measure of agreement 
considered is based on the angle þe- 
tween vectors of choices; such as 
(0, 0, 1, 0) the first row vector in 
Table 1 (when zero is inserted in the 
diagonal cell). 

The cosine of the angle between 
two vectors @ and £ is equal to their 


[9] 


4 


vector of responses generated by each 
individual in the group with that of 
every other member of the group. 


ANALYSIS oF SUBGROUPS 


A large group often ïs assumed to 
have distinguishable subgroups 
within it. In certain organizations 
these subgroups are officially desig- 
nated, e.g., the board of directors of 
a company. In other organizations, 


they are neither official nor immedi- 
ately obviou 


factions wit 
There have been S! 
move from the 
individuals to 
segregation of subgroups on the basis 
of clear-cut criteria, The gener; 
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groupings (of men and machines). It 
is also of interest to the student of 
administrative organization, for ex- 
ample, to compare officially desig- 
nated subgroups with the actually 
functioning subgroups in an or- 
ganization. 


Matrix Manipulation and Reduction 
Method 


There are several techniques avail- 
able for the analysis of subgroups. 
The first one suggested was that of 
Forsyth and Katz (1946) for matrices 
with bipolar entries (ai;=0 or +1 or 
—1). It can also be applied to the 
simpler case with binary entries (ji 
=0 or +1). 

The Forsyth-Katz technique in- 
volves simultaneously switching the 
rows and columns of the matrix so 
that positive entries cluster about the 
diagonal, i.e., their distance from the 
diagonal is minimized. If there are 
negative entries their distance from 
the diagonal is maximized. The tech- 
nique helps identify cliques (clus- 
ters), leaders (center of cluster), and 
rank of cliques. The steps may be 
summarized as follows: 

1. Place in Rows 1 and 2 (and Columns 1 

and 2) a pair of individuals who choose 


each other. 

2. Place in Row 3 (and Column 3) an in- 
dividual who has mutual choices with 
both or, failing that, is chosen by both of 
the preceding pair. 

3. Continue adding individuals in this 
fashion, using as criterion the require- 
ment that each new member is chosen by 
at least 50% of the people already in- 


cluded in the subgroup. 
4. Remove the subgroup when no more can 
be added and repeat the process with the 


reduced matrix. 


After all the subgroups have been 
sp lecteu in this fashion, Forsyth and 
pate specify methods for arranging 
( e entire group along the diagonal 
bed that subgroups that reject each 
: er are most widely separated) and 
or placing individuals who have not 


325 


been included in any subgroup. The 
final manipulation is to arrange the 
members of each subgroup so that 
those receiving the greatest number 
of choices are at the center. The 
Forsyth-Katz technique requires ra- 
ther awkward manipulation of ma- 
trices. Katz later developed a 
punched card technique for this an- 
alysis (1950). Methods for expedit- 
ing the rearrangements can also be 
adapted from the scalogram board 
technique developed for Guttman’s 
scalogram analysis. 


Diagonal Maximiszation Method 


Beum and Brundage (1950) have 
developed a more systematic tech- 
nique for carrying out the rearrange- 
ment of a matrix with positive 
weighted entries. The technique is 
based on the following idea. 

“Tt can be shown that if weights 
are assigned to the rows of a socio- 
matrix... and the average product 
of the elements in each column and 
the corresponding weights is maxi- 
mized for each column, the sum of the 
squares of the elements about the 
principal diagonal is minimized.” The 
technique as outlined below requires 
that the system of weighting indi- 
vidual choices or relationships gives 
the heaviest weight to the most im- 


portant choices. The steps are as fol- 
lows: 

1, Zeros are inserted in the diagonal cells 
and column sums are obtained. 

2. Weights of 1 to N are assigned to the 
rows of the matrix, beginning with 1 as 
the weight of the bottom row. 

3. All matrix entries are multiplied by their 
row weight and weighted column sums 
are obtained. hh 

4. The weighted column sums arè divided 
by the unweighted column sums to ob- 
tain the average weight. 

5. The matrix is then rearranged in order 
of average weights so that the column 
with the largest average is moved to the 
extreme left and the corresponding row 
js moved to the top. The column with 


—— 


. Luce and Perry (1949). 
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the next largest average is placed second 
from the left, etc. 


The procedure is repeated until 
further iteration does not change the 
arrangement or results only in an 
alternating arrangement of the col- 
umns. The procedure is related to 
that used in obtaining a latent vector 
of the matrix. Its success depends on 
the absence of overlap in the sub- 
groups. 


Matrix Multiplication Method 


A technique designed for the an- 
alysis of subgroups in binary matrices 
has been presented by Festinger 
(1949) and developed extensively by 
The tech- 
nique as first proposed analyzed 
“cliques.” A clique was defined asa 
subgroup of three or more members 
all of whom were symmetrically re- 
lated. The clique includes all mem- 
bers who meet the requirement of 
symmetrical relationship. 
ample, an individual is a cliq 
ber if h 


For ex- 
ue mem- 
€ communicates with every 
member of the group and every mem- 
ber of the group communicates with 
him. This stringent requirement is 


relaxed in a later development by 
Luce (1950), 


In this method, the matrix sum- 
marizing the group’s relationships is 
reduced to an S-matrix Consisting of 
all entries which are symmetric about 
the diagonal. This means eliminating 
all but mutual choices from the orig- 
inal matrix. The new S-matrix js 
then squared and cubed, The diag- 
onal entries of S? give the number of 
mutual friends for each individual, 
The diagonal entries of Sè indicate 
whether an individual is a clique 
member. The magnitude of the off- 
diagonal terms in both $? and S? in- 
dicates the compactness of the entire 
group. 

The following theorems are pre- 
sented and proved by Luce and Perry. 
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1. The magnitude of the cell entries aii 
in A” indicate the number of distinct 
n-chains from i to j.7 An n-chain is a set 
of (n+1) interrelated (interconnected) 
members. Thus, a speaks to b who 
speaks to c is a 2-chain, ‘ 

2. A group member i has a main diagonal 
value of m in A? if, and only if, he has a 
symmetric relationship with m members. 

3. A member ¢ is contained in a clique if, 
and only if, his main diagonal entry in $? 
is positive. 

4. If there are ¢ diagonal entries in $? each 
of which equals (t—2) (t—1) and the re- 
maining diagonal entries are zero, then 
the ¢ individuals form a clique. 


Luce and Perry also present a gen- 
eral formula relating the magnitude 
of entries for member 7 in the di- 
agonal of S$? to the number, overlap, 
and size of the cliques of which ż is a 
member. The formula and its proof 
are, however, incorrect, They apply 
only to special conditions of overlap 
in cliques, 


On the basis of the theorems, the 


following Procedure may be used to 
analyze a group: 

1. Find the row in ss 
one of the smallest diagonal entries. 

2. Choose those members whose entries for 


highest. The number 
chosen will depend on the size of the 
(If the diagonal entry 
(5—1) then the 4 high- 


with the smallest or 


(or equal) diagonal entry until each of 
the clique memb 
a clique, 

5. Check the final set of cliques to make 
sure that each clique member is included 
in all cliques to which he belongs. 


ars more 
7 This relationship is th 


1 e basis for Katz’ 
status index. See Table 4, 


ba 
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than once. Thus, a redundant 2- 
chain would be a speaks to b who 
speaks to a. The work on redun- 
dancies, which is necessary for the 
identification and enumeration of 
chains and circles in the group, has 
been carried further by Katz (1952b) 
and Ross and Harary (1952). 


Generalized Matrix Multiplication 
Method 


In a subsequent article, Luce 
(1950) generalized the method by re- 
laxing the definition of clique. He de- 
fined an n-clique as a subgroup of 
individuals who are all or fewer 
links (choices, communication links) 
from each other. This extends greatly 
the utility of the approach. 
The matrix manipulation for the 
analysis of n-cliques is closely related 
to the operations given above for the 
analysis of cliques. (They would now 
be called one-cliques.) Only the first 
few steps are different. 
1. Compute the matrix powers Ay PA? 
AE + AM 

2. Add these together. 

3. Take the cells of the summed matrix, re- 
place every positive entry with a one, 
and replace all diagonal entries with a 
zero. 

4, From this point on, proceed as in the 

original matrix multiplication method. 

The techniques developed by Luce 
have the advantage of requiring rela- 
tively simple procedures. They have 
one major disadvantage in that they 
severely limit the type of information 
they can handle. As developed, they 
can deal only with the presence or ab- 
sence of a relationship, i.e., binary en- 
tries. Indications of degree of rela- 
tionships (e.g., likes very much or 
communicates infrequently) cannot 
be handled. It is, of course, possible 
to reduce weighted entries to binary 
entries by dichotomizing them ac- 
cording to some criterion. 

fee of the successful ap- 
Plication of the matrix multiplica- 


tion method may be found in a study 
by Chabot (1950). In the study, 
groups in an industrial situation were 
analyzed to test hypotheses concern- 
ing the relation of group member- 
ship to production. 


Vector Analytic Methods 


It was noted earlier that the inner 
product of vectors of choices can be 
used to compute a correlation co- 
efficient. By taking each row vector 
that makes up the matrix of relation- 
ships, the matrix can be easily con- 
verted into a matrix of correlations. 
The entire body of factor analytic 
techniques can then be brought to 
bear upon the data. 

Bock and Husain (1950) have pre- 
sented another special type of factor 
analytic approach. They start witha 
choice matrix summarizing the sub- 
ject’s ranked preference for other 
group members. On the basis of the. 
rank and whether or not the choices 
are mutual, weights are assigned to 
each cell entry. These weights are 
then analyzed by means of Holzing- 
er's B-coefficient technique. 

Rather than converting the vectors 
to correlation coefficients, and then 
following standard factor analytic 
procedures, it is more appropriate to 
deal directly with the original set of 
vectors that compose the socioma- 
trix. The steps then would involve 
setting up a basis for the space 
spanned by the vectors, and then 
rotating the basis to give some satis- 
factory fit. 

Viewing the choice or communica- 
tion vectors as an arrangement of 
vectors in a space leads to certain 
other ideas. In this arrangement, the 
number of dimensions in the vector 
space is related to the homogeneity of 
the group. A completely homogene- 
ous group would have dimension 
equal to one. Everybody would 
choose everybody else, and woul 
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have the choice vector (1, 1, 3,3 =>, 
1), if the diagonal entries are taken as 
equal to 1. All vectors would fall then 
on the equiangular line. The equi- 
angular line gives then a baseline for 
measuring the homogeneity % me 
oup. If the group is completely 
eGela], then each individual 
chooses or communicates only to 
himself. The vectors are then in a n- 
dimensional Euclidean vector space 
with the basis (1, 0, 0,---, 0), 
(0, 1,0, - -+ , 0), etc. in which n = the 
number of individuals. It can be 
reasoned further that if the group 
consists of several subgroups, then 
this number will determine the 
dimensionality of the vector space. 
The number of dimensions required 
can be ascertained by solving for the 
number of nonzero latent roots of the 
matrix. 
The advantages of a factor or 
vector analytic approach are the fol- 
lowing: 


1. The position of an 
to all the sub; 
described. 

2. Use can be made of d 
ent degrees of rel 
pressed. 


It should be noted, however, that 
this type of approach involves a dif- 
ferent definition of clique or sub- 
group. Individuals are members of 
the same clique if their patterns of re- 
lationship to other individuals are 
similar. It may be, therefore, that 
the Luce-Perry type of analysis is 
best suited for certain types of rela- 
tionships (communicates with, influ- 
ences) whereas a vector analytic ap- 
proach is best suited for other rela- 
tionships (chooses, finds himself sim- 
ilar to). 

The practical implications of anal- 
ysis of a group into subgroups 
should be underlined. The analysis 
is necessary whenever it is desired to 
group together individuals who in- 


individual in relation 
groups or cliques may be 


ata in which differ- 
lationship are ex- 
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teract (e.g., hand each other tools) 
most frequently. An example of this 
type of problem in an industrial 
setting may be found in the discus- 
sion by Chapanis, Garner, and Mor- 
gan (1949) on rearrangement of men 
and equipment in a shop. The appli- 
cation of the technique to other work 
situations, such as offices with com- 
munication problems, is obvious. 


ASSIGNMENT OF INDIVIDUALS TO 
SuBGRoups 


The analysis of subgroups is pre- 
liminary to the planned location or 
relocation of individuals in sub- 
groups. It leads naturally, therefore, 
to the following general question. 
How can the individuals in a larger 
group be divided into subgroups on 
the basis of the relationships between 
the individuals? 

Hotelling (1954) raised the issue in 
terms of the best way to assign indi- 
viduals to a given number of teams, 
so that their sociometric choices de- 
termine their placement as much as 
possible. He set up the following pro- 
cedure for computing the total satis- 
faction of the group under various 
groupings. First the matrix of choices 
is examined and then an assignment 
matrix J is constructed. The assign- 
ment matrix has entries 1 or 0 indi- 
cating whether an individual ¿ has 
been assigned to team j. For example, 
given the choice matrix in Table 1, 
with zeroes on the diagonal, it may be 
desired to separate the group into two 
subgroups of two men each. Two 
Possible arrangements are indicated 
in the assignment matrices J; and J2 
in Table 6. In Ji,a@and b are assigned 
to one subgroup and c and d to the 
other. In J; a and d are placed to- 
gether and b and c are placed to- 
gether, The good fortune of indi- 
viduals is defined in terms of J/AJ, 
the choice matrix Postmultiplied by 
the assignment matrix and premulti- 


as > 
a 
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TABLE 6 


ASSIGNMENT MATRICES AND 
SATISFACTION MATRICES 


x ¥ £ E 

a 1 0 a 1 0 
b 10 p 0i 
n=; 01 I=; 01 
d 01 d 1 0 
WA = $ HAK™ & 9 


plied by the transpose of the assign- 
ment matrix. 

The total satisfaction of the in- 
dividuals in each group is indicated 
by the sum of the values in the prin- 
cipal diagonal. Thus Ja with the en- 
tries on the diagonal summing to 
three would be considered a better as- 
signment than Ji where the J’AJ di- 
agonal sum is two. This means that 
in Jo three out of the eight positive 
choices fall within the assigned sub- 
groups. 

Katz has begun work on situations 
in which the sizes of the subgroups 
are permitted to vary (1952b). Work 
has also begun on the development of 
techniques to maximize the sum of 
entries on the diagonal of the satis- 
faction matrix (Katz, Olkin, & Powell, 
1952). This is a crucial problem be- 
cause the number of ways to partition 
n individuals into k subgroups with 
m members each is !/(m!)k. This 
easily becomes quite large. For ex- 
ample there are 12!/(4!)3 or 34,650 
ways to partition 12 individuals into 
3 groups of 4 members each. This 
problem has similarities to the per- 
sonnel classification problems that 
can be solved by linear programming. 


OTHER APPROACHES: GRAPH THEORY 
Locic or RELATIONS 

The concern with methods of de- 

scribing and analyzing groups has re- 

sulted in the attempt to bring a spe- 


cial branch of mathematics, graph 
theory, to bear on the problems of 
group organization. Anonmathemat- 
ical presentation of graph theory was 
made in a monograph by Harary and 
Norman (1953). Most of the mono- 
graph is devoted to the discussion of 
definitions that have been employed 
in mathematical graph theory and 
the relation of the definitions to 
psychological concepts. For example, 
Lewin’s concept of boundary between 
two regions in a life space is trans- 
lated here as in Bavelas (1948, 1950) 
into a connection between two points. 

In addition, Harary and Norman 
consider a generalization of graph 
theory to adapt it to psychological 
problems. The generalization in- 
cludes the addition of such concepts 
as strength—the number of lines join- 
ing a pair of points. Strength, of 
course, has been discussed previously 
as weighted cell entries. Moreover, 
since the graph as defined in mathe- 
matical work centers on symmetric 
relationships, they also present the 
concept of the directed graph in 
which symmetry is not required. 

Although some studies have been 
done using the vocabulary of graph 
theory, its effect on psychological 
theory and experimentation has not 
been extensive to this time. Exam- 
ples of the use of the vocabulary may 
be found in the recent article by 
French on social power (1956) and in 
Weiss’ work on organizational struc- 
ture (1956). Out of the work in this 
area, however, other techniques have 
been developed for the analysis of 
subgroups. 

A point that has been stressed on 
the basis of graph theory considera- 
tions concerns the role of key or 
liaison positions in a group., These 
are positions that serve as links be- 
tween subgroups, i.e. positions 
whose elimination results in the 
group’s falling into distinct sub- 
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groups. Ross and Harary (1955) have 
developed techniques for determining 
the liaison positions in a group. The 
steps in the analysis are the follow- 


ing: . 
1. A symmetric matrix with binary entries 
is written describing the group's struc- 
ture. Note that here again there is the 
restriction to symmetric matrices. a 
2. A matrix of distances between positions 
is constructed. This can be computed by 
using powers of the original matrix, 
Zeros are entered in the diagonal and 
one is entered in each cell with a Positive 
entry in the original structure matrix A. 
Successive powers of A are then taken. 
Whenever a cell entry first increases from 
zero, then the power of the matrix in 
which this change occurs is entered in 
the corresponding cell of the distance 
matrix. (See Luce and Perry's first 
theorem above.) For example, if ai; be- 
comes positive when the matrix is 
squared, a two is entered in the corre- 
sponding cell of the distance matrix. 
Successive powers of the matrix are 
taken until all cells are accounted for. 
- The highest number in the distance 
matrix is found and the Positions whose 
columns contain this maximum number 
are eliminated from consideration as 
liaison positions. (These are called 
peripheral points.) 
Positions whose entry in any row is the 
highest for that Tow are eliminated, 
(These are called relatively peripheral 
points.) 
5. Positions are liaison Positions whose 


entries in any row satisfy the following 
requirements: 


(a) not zero, 
(b) not the maximum for that row, 
(c) unique for that row. 


w 


a 


These steps may still leave some 
positions unclassified. Ross and 
Harary present techniques for clas- 
sifying the remaining positions, The 
discovery of liaison positions can 
have two uses. One is to distinguish 
key personnel in an organization, 
The other is to afford a basis for anal- 
ysis of the group into subgroups or 
cliques. Harary and Ross (1957) 
have also presented another method 
for the analysis of cliques in binary 
matrices that covers the case of over- 
lapping subgroups. In order to do 


this they introduce the operation of 
elementwise multiplication of ma- 
trices symbolized by X. Under this 
operation, each cell in one matrix is 
multiplied by the corresponding cell 
in the other matrix. The analysis is 
based on the use of individuals who 
are members of only one clique, “uni- 
cliqual” individuals. The steps in the 
analysis are the following: 


1. The symmetric matrix, S, is obtained by 
eliminating nonreciprocated cell entries. 

2. The matrix SYS is constructed. The 
cell entries indicate whether a pair of in- 
dividuals belong to the same clique. 

3. All rows which consist entirely of zeros 
are eliminated. Their corresponding 
columns are eliminated. The reduced 
matrix is called M. 

4. The row sums, 7, and the number of cell 
entries, n, greater than zero in each row 
are obtained for M. If, fora given row 

r = n(n—1) 
then the individual represented by that 
Tow is member of only one clique, All 
members of that clique can be isolated 


by selecting all other individuals with a 
Positive entry in that row, 
5. If this clique does not include all mem- 
bers in M, this procedure is carried fur- 
ther. The group is divided into two sub- 
groups: those who belong only to the 
first clique, and the remainder. The pro- 
cedure described above is repeated on 
the remainder subgroup. 


After all unicliqual members are ac- 
counted for, further subgrouping and 
analysis is continued. These sub- 
8roups are defined on the basis of 
members whose row sums are mini- 
mal in the last reduced matrix. 

In addition to the techniques and 
approaches discussed thus far, recent 
evelopments in the fields of mathe- 
matics and logic promise to be rele- 
vant. It may be that the study of 
groups requires consideration of more 
complicated relationships than the 
dyadic relationships considered above 
to b about what c said 
some of Tagiuri’s recent 
~—@ expresses his atti- 
nd predicts p's attitude 
elf). Copilowish (1948), 


work (1952) 
tude to b, a 
toward hims 
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Copi and Harary (1953), and Davis 
(1953) have considered such 7-adic 
relationships. 


DiscussION 


One of the major outcomes of the 
work described above has been to 
present a set of techniques for sim- 
plifying and analyzing the complex 
data generated by group functioning. 
These techniques, moreover, involve 
the translation of the data into math- 
ematical form that permits the ap- 
plication of a wide range of powerful 
analytic techniques. In addition, 
these techniques are applicable not 
only to the original sociometric rela- 
tions (eg., “like,” “chooses”’) but 
also to other relations (e.g., ‘“com- 
municates to,” “hands materials to,” 
etc.) They therefore permit the anal- 
ysis of a much wider variety of or- 
ganizational relationships than hither- 
to studied.® 

As these analytic techniques be- 
come better known and more widely 
applied they will help promote the 
construction of quantitative systems 


8 Examples of the structural properties of 
formal military teams that may be submitted 
to this type of analysis are considered by 
Glaser (1958). 


for group behavior. It should not be 
expected, however, that these or re- 
lated techniques will furnish easy 
solutions to the problems of group 
structure and functioning. In most 
cases they will probably be helpful 
only in clarifying the requirements 
for adequate descriptive or explana- 
tory systems. At best, these tech- 
niques will provide the variables to 
be incorporated in such theories. 
There remains the task of construct- 
ing theories concerning the behavior 
of groups. There are two types of 
theorizing possible with sociometric 
or structural data. One concerns the 
causes of particular patterns or in- 
dices. The other concerns the effects 
of these. An approach to theories 
concerning the cause of patterns is in- 
dicated in some work by Landau 
(1951a, 1951b, 1953) on the special 
case of dominance relationships. 
Much more work of this type is 
needed before the usefulness of the 
various measures suggested in the 
sociometric literature can be assessed. 
The second type of theorizing con- 
cerns the effects of patterns. This 
type of theory initiated the group 
network studies. These studies will 
be evaluated in a subsequent paper. 
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MULTIPLE METHODS OF PERSONALITY ASSESSMENT? 
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University of Western Australia 


The term “personality assessment” 
refers to any procedure aimed at de- 
scribing a person’s characteristic be- 
havior by categorizing him with re- 
spect to some communicable dimen- 
sion or dimensions.? ‘Since the OSS 
assessment procedures, however, the 
term has tended to be pre-empted for 
the procedure where several different 
types of assessment techniques are 
applied to the subjects and the final 
assessments are made by the com- 
bined judgments of several assessors 
concerning the subjects’ predicted be- 
havior outside of the assessment situ- 
ation. These procedures are “mul- 
tiple” in two senses: with respect to 
the techniques and with respect to 
the assessors. 

Our treatment in this paper will 
deal with the basic logic of this type 
of assessment, and the discussion 
will be illustrated by the best known 
multiple personality assessments, de- 
tails of which are outlined in Table 1.8 
Each one of these assessments has, 


1 The author expresses his thanks to the 

colleagues who have discussed various points 
in this paper with him, especially to James 
Lumsden; also to Saul B. Sells for his valuable 
comments. 
_ ? This is the same procedure as “instantiat- 
ing a person object ina module or set of mod- 
ules,” a terminology which the writer has pre- 
ferred in another context (Sarbin, Taft, and 
Bailey, in press), but which is avoided here in 
the interests of communicability. 

3 Insofar as the assessments use multiple 
techniques, the problems of inferring the pre- 
dictions and validating the tests are the same 
as those involved in other multi-variate pro- 
cedures. See, for example, the treatment of 
these problems in Thorndike (1949). Our 
emphasis here will be mainly on the problems 
that arise from the combination of multivari- 
ate procedures and multiple assessors. 


in its own way, constituted a mile- 
stone in the history of multiple per- 
sonality assessment. 

The researches into personality 
conducted at Harvard in the 30’s un- 
der the direction of Murray (1938) 
were the first to use the typical pro- 
cedures of personality assessment— 
diagnostic committee assessments of 
personality based on interviews and 
a varied battery of objective, projec- 
tive, and situational tests. However, 
unlike the later assessments, no out- 
side criterion was used in these Har- 
vard studies, and, therefore, no more 
than passing reference will be made 
to them. The same applies to the 
continuing series of studies of per- 
sonality carried out by Cattell and 
his students (Cattell, 1957) which 
started to employ external criteria 
only at an advanced stage of its prog- 
ress. The British War Officer Se- 
lection Boards (WOSB), which were 
inspired by the German officer multi- 
ple technique selection procedures 
(Farago & Gitler, 1941), pioneered 
the use of a quasi-natural social situa- 
tion, including the leaderless discus- 
sion, as a basis for judging the po- 
tential social skills of the candidate. 
They also produced the first valida- 
tion material on multiple assessment 
procedures as a means of selection. 
The British Civil Service Selection 
Boards (CISSB) continued this work, 
with more emphasis on the valida- 
tion of individual techniques as well 
as the technique as a whole. The 
OSS assessment highlighted the psy- 
chological problems inherent in as- 
sessment and won many supporters 
for the value of combining multiple 
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TABLE 1 
DETAILS OF MILESTONE AssESSMENT PROGRAMS 


i i i iews; zati f group 
i : indi l and multiple interviews; observation ol 
i echniques used: individua nd r a inte! z 
reani a Bed situational tests; objective, projective, and perform: 
pi ance tests; “made to measure” inventories) 


Assess Primary faeces. Assessors 
(in order oi Assess 
Assessment Date eer purpose importance) 
Young men, mainly | Personality Analytic Psychologists 
E AERE PE e Harvard undergrad, research 
uates (paid subjects) 
ii iti = lectii Analytic Army officers, psychia- 
MAEN Say et aan dans formes Sr Global trists, and psychologists 
Morris, 1 
© S. i Selection Analytic Psychologists, psychia- 
CES BEN Bi ey at Global trists, and other social 
candidates scientists 
ichi; Clinical psychol Validation of | Empirical Psychologists (clinica! 
Piske IS SO E | INC | aaaea? | Validation of | Epica | Peyenologists 
i R (Vari- | 1950-51 | Advanced graduate | Personality Empirical | Psychologists 
Se Rae SA un students research; val- | Analytic 
ublished reports, e.g., idation of Global 
Barron, 1954; Gough, techniques 
1953) 
i Stein, & | 1952-54 | Students in theol- Validation of | Analytic Psychologists 
Sia | ogy, education, and | techniques Global 
arts 
Menninger (Holt & Lu- | 1946-52 Psychiatric training Selection, vali- | Global Psychiatrists and psy- 
borsky, 1958) candidates dation of tech- Analytic chologists 
niques Empirical 
tests and observations by pooling the validation). Different assessment 
judgments of several assessors; the 


Michigan VA assessment program 
did much to upset that support while 
the Chicago and Menninger assess- 
ments reinstated some of it through 
their promising findings. The Cali- 
fornia Institute of Personality As- 
sessment and Research (IPAR) dif- 
fers from the other assessments in 
emphasizing research into personal- 
ity to a greater extent. 


THE ORIENTATION AND PURPOSE OF 
PERSONALITY ASSESSMENTS 


Three foci of assessment can be 
distinguished: human performance in 
some socially defined situation or 
situations (the criterion perform- 
ance); performance in defined as- 
sessment situations, i.e. tests (the as- 
sessment performance); and the link 
between these two performances (test 


Programs have been oriented towards 
one or more of these aspects depend- 
ing on their Primary purpose (see 
Table 1). The orientation towards 
criterion performance implies the pri- 
Mary purpose of assessing candidates 
with respect to the criterion in order 
to select or reject them. The orienta- 
tion towards the assessment per- 
formance is concerned with the 
validation of the assessment tech- 
niques themselves, while the orienta- 
tion towards the link between per- 
formances is concerned with research 
on the functioning of personality. 
Selection was the original purpose 
for the WOSB, CISSB, and OSS 
assessments; in each case, the as- 
Sessors were presented with the im- 
mediate problem of selecting from a 
given group of candidates those who 
would make the most adequate army, 
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TABLE 1 (Continued) 


Some selected validities (uncorrected 


Criteri 
analysis Method of Main criterion 
method rating aoe for selection of groups) 
= Committee No external criteria = 
Farzona! knowl- | 1. Committee Supervisors’ reports 1. CISSB committee 0.13-0.25. 2. Review Board 
and 2. Final 0.23-0.41. (When corrected for selection the 
Review Board range of validities is 0.50-0.66.) 
pa tuitiveanailie Committee Field reports by the asses- | “Overalt ratings 0.08-0.53 (varying with as- 
SA with ex- sors and by field command- sessment group and criterion). Rating of “Ef- 
ers on several molar traits | fective Intelligence,” 0.33-0.53 


and clinical competence, 0.37. 


Personal knowl- | Individual and | Ratings by clinical teach- | “'Over-all” rating 
i Miller Analogies. and clinical competence, 0.35. 


edge pooled ratings 


peraonal knowl- | Committee 
ge student's professional 


Committee j i sj 

i job | Committee Teacher's judgments 
analysis and in- exam reat ts 5 
terviews with 

teachers 

Committee job Individual and | Supervisor's ratings 
ane yon and averaged rat- (pooled) on specific an 
success and ings of inter- general competence 
Suura viewers 


1. Teacher's prediction of | 1. Cross- 


ers and supervisors on sev- 
eral aspects of clinical work | Strong Interest Key for Clinical Psychological 
and Research Competence, 0.35. Other validities 


lower 


validated inventory with 1., 0.29 
po- | 2. Committee ratings with 2.,0.41 


and | Very high validities 


Interviews (global), 0.24. Interviews (analytic), 
0.26. Tester's analytic ratings on projectives, 
0.27. Objective scoring of projectives cross-vali- 
dated at zero. Best interviewer (all data) 0.57 


civil service, and secret service of- 
ficers, respectively. After a consider- 
ation of the personality requirements 
of the positions for which they were 
selecting officers, the assessors judg- 
ing the candidates on the basis of 
techniques chosen either because 
they appeared to have face validity 
for measuring these requirements Or 
because the assessors were familiar 
with their use. At least inthe case of 
wartime assessments, neither the 
time available nor the conditions 
permitted more scientific procedures 
than that, and it was hoped that 
accuracy would be achieved through 
weight of numbers (of techniques 
and of assessors). 
ts est validation (and construction). 
TEN of the later assessments, no- 
aa y the VA study, set as their short- 
EAR the task of developing and 
idating techniques for future use 


The validation studies 
were applied not only to the individ- 
ual items and tests, but also to the 
purely subjective techniques, such as 
group observations and interviews. 
In some of them, 8+ Menninger, 
the individual judges were also 
validated as though their judgments 
were scores on a test. When we 
speak of the validity of assessment 
techniques it is important to include 
these judgments among the tech- 
niques, as they vary greatly in their 
accuracy. 

The Harvard studies were the first 
to use multiple assessment tech- 
niques for pers? j and 
the outstanding recent 


in selection. 


use the combine 
ts of several assessors: 
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Harvard studies dealt with the cor- 
relations between different perform- 
ances that were elicited in the test 
situation, whereas the IPAR studies 
were concerned, in addition, with the 
relationship between the assessment 
performances and criterion measures 
such as ratings by university teachers 
of the subject’s professional poten- 
tial, his originality, and his personal 
soundness. The Michigan studies of 
clinical psychologists were similar in 
orientation. 

Most of the personality assess- 
ments have tried to pursue more than 
one of the above purposes at once, 
but there are drawbacks to such at- 
tempts at economy. For example, an 
attempt was made in the CISSB 
studies (Vernon, 1950) to combine 
selection and validation, but the 
validation indices were lowered and 
distorted by the attenuation of the 
sample through rejection of candi- 
dates. The low validities obtained 
became remarkably high (for that 
sort of prediction) when a correction 
for selection was applied, but such 
corrections are only arbitrary esti- 
mates. The use of assessment proce- 
dures for selection implies that the 
procedures have already been vali- 
dated, but this has usually not been 
the case. The assessors have either 
had to use whatever prior knowledge 
they possessed about the validity of 
the techniques for the Purpose at 
hand, or they have had to base their 
predictions on the relevant postulates 
in their theory concerning the link 
between the assessment and the cri- 
terion behavior of the subjects. For 
example, the assessors presume that 
the situational tests in the assessment 
program have what Cronbach and 
Meehl have termed “content” valid- 
ity (1955). But in selection, this type 

of validity can be regarded only as a 
holding procedure for an ultimate 
“predictive” validity. Where the 


RONALD TAFT 


criteria are imprecise and not re- 
peatable, or where selection is urgent, 
a separate validation study may not 
be practical, and under these circum- 
stances there is no alternative to con- 
ducting selection without prior vali- 
dation. It still may be possible, how- 
ever, over a period of time, to utilize 
the imperfect validational material 
that becomes available in order to 
improve the existing selection proce- 
dures. This seems to have been the 
case, for example, in the OSS studies. 

Validation studies of the assess- 
ment techniques also logically pre- 
cede the use of those procedures for 
personality research, although tech- 
niques used in such research often 
are accepted on the basis of their 
face validity. To use the one and 
same study to validate the tech- 
niques and to use them to measure 
Personality is lifting oneself up by 
one’s boot-straps, In fact, however, 
the assessments which attempt to 
carry out this dual purpose obtain 
independent support for the ‘boot- 
Strap lift” from already existing in- 
formation regarding both validation 
and the functioning of personality. 
Even then, the interpretation of per- 
sonality research projects that do not 
commence with a pilot study on the 
validity of the instruments is always 
subject to doubt. How do you know 
that expressed hostility to authority 
figures on the TAT measures sup- 
Pressed rebellious tendencies? How 
© you know, when assessor X ob- 
Serves a subject to be dominant, that 
he is dominant? How do you know 
that observed “role empathy” in a 
role-playing test is a valid predictor 
of social skill? Such questions can be 
answered only by the Progressive 
refinement of validity information 
and Personality theory. 

Assessment procedur 
on many unvalidate 
the correlations bet 


es usually rely 
d tests, and when 
ween the tests are 


me 
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used as a means of studying per- 
sonality—as in the case, for example, 
of the Harvard and IPAR studies— 
it is necessary to decide whether 
these correlations are to be treated 
simply as validity indices, or whether 
the validity of the tests will be as- 
sumed and the correlations treated as 
throwing light on the relationships 
between different personality struc- 
tures. The problem of simultaneous 
validation of tests and the study of 
personality is related to the problem 
of “concurrent” and “construct” 
validity. By setting up some of the 
behavioral measures made during the 
assessment as tentative criteria, it is 
possible to validate other assessment 
measures against these. Cronbach 
and Meehl call this concurrent vali- 
dation, and it is one way of utilizing 
previous knowledge of validities by 
choosing criteria measures that have 
reasonably well-established reliabili- 
ties and validities. Then on the basis 
of all that is known about these 
measures, their implications for the 
understanding of personality can be 
explored further by a strategy of 
construct validation. The data col- 
lected during the assessment can be 
added to the “nomological nets” al- 
ready used in thinking about the 
particular personality constructs an 

new hypotheses developed for inves- 
tigation in later studies. Thus, even 
an assessment program that is aime 

primarily at the purpose of selection 
can make a contribution to personal- 
ity research through construct valida- 
tion. (The place of construct validity 
in an assessment program is discussed 
more specifically below under the 
heading of analytic strategy.) 
concept also enables an assessment 
program to avoid the problem of the 
priority of validation of instruments 
(versus conducting personality re- 
search) by conceiving both validation 
and personality research as two as- 


pects of the one endeavor, both as- 
pects gradually throwing light on 
each other as more and more data 
accumulate. 

But this double-aspect approach 
of construct validation is an uneco- 
nomical process. Refinements may 
often be made more readily to our 
personality theory or to our knowl- 
edge of the validity of the techniques 
by a more direct approach to one or 
the other. In this case the problem 
of priorities which we have discussed 
cannot be avoided. 


THE PREDICTION STRATEGIES 
IN ASSESSMENT 


The Criterion 

All assessment programs involve 
studies of the link between two or 
more pieces of behavior, whether the 
primary purpose be selection, valida- 
tion research on tests, Or personality 
research. Some of this behavior is 
known as assessment behavior an 
some as criterion behavior. These 
concepts are analogous to the inde- 
pendent and dependent variables in 
experimental psychology, and it is an 
arbitrary decision by the experimen- 
ter which one is designated as which. 
Most of the reports of assessments 
have devoted some space to the cri- 
terion problem, especially the report 
of the Chicago assessments (Stern, 
Stein, & Bloom, 1956). Most of the 
problems are similar to those in- 
volved in the validation of mul- 
tivariate objective techniques dis- 
cussed, for example, by Thorndike 
(1949). Te 

A special problem that arises in 


personality assessment 1S the fre- 
f the criteria 


at vary from one cri- 


to another. 


reliability imposes 4 serious limita- 
tion on the potential validity of per- 


sonality assessments, and it makes it 
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difficult: to evaluate some of the low 
validity coefficients reported. _ 
The designation of the criteria of 
performance is determined by the cir- 
cumstances of the assessment, and 
usually must be taken for granted by 
the assessors. Thus, in the Chicago 
study the assessors explicitly ac- 
cepted the principle that the criterion 
ratings represented the predilections 
of one or more supervisors with whom 
the subjects interacted in the cri- 
terion situation, and that the asses- 
sors’ predictions of the subjects’ suc- 
cess must be made in reference to the 
“psychological job requirements” im- 
plied by these predilections and in- 
teractions. The assessment strategy 
should be aimed at the criterion, once 
the latter has been established. Kelly 
(1957) did not accept this principle in 
his researches on medical school se- 
lection. In this study he analyzed 
the criterion measures and found 
that there were at least three, and 
possibly four, types of medical per- 
formance which could be predicted 
independently. In the long run, how- 
ever, a selection program has to 
choose between the independent cri- 
teria, or the criteria have to be com- 
bined by some type of simple, 
weighted, or complex, interactional 
summation, or by taking account of 
one critical instance, 
A complication that arises in cri- 
teria analysis, such as that of Kelly, 
is that an assessor can only predict 
to indices of the criteria, not to the 
actual criteria themselves, It may be 
possible in some instances for the as- 
sessor to demonstrate that an index 
used in assessment has a low correla- 
tion with some more satisfactory, al- 
though less accessible, criterion index; 
for example, that academic grades in 
medicine do not represent the doctor’s 
subsequent service to the community 
as a practitioner. Assuming that the 
latter is accepted as the more funda- 


mental in medical practice, the as- 
sessors should predict to it rather 
than to academic grades by trying to 
obtain some accessible index which 
more realistically measures this cri- 
terion of community service. Some- 
times the assessors may be able to 
convince those who control the cri- 
terion ratings that the indices which 
the latter are using are not consistent 
with their fundamental criterion, but 
eventually the assessors and the cri- 
terion raters must agree on some 
criterion index in accordance with 
the policy of the organization. Other- 
wise it would be absurd to speak of 
the validity of the assessment. 

Three types of strategies can be 
distinguished for predicting the cri- 
teria performance: naive empircal, 
global, and analytic, and we shall now 
consider each strategy in detail, 

1. Naive empirical. This refers to 
the classical method of test construc- 
tion, adapted from aptitude testing, 
in which the inclusion in a selection 
Program of a test—or test item, 
which may be treated for our pur- 
poses as a separate test—is deter- 
mined mainly by its predictive valid- 
ity, i.e., by the degree to which it cor- 
relates with or discriminates a speci- 
fied criterion, Tests that are not 
sufficiently valid are either dropped 
from the program or amended and no 
consideration is given to the meaning 
of the test behavior, except as an 
afterthought. The naive empirical 
Strategy, thus, is one in which in- 
erence proceeds directly from test to 
criterion without the mediation of 
intervening variables. 

Not a great deal of use has been 
made of this empirical strategy in 
multiple personality assessment, 
partly because of intellectual resist- 
ance to atheoretical Procedures on 
the part of Personality researchers, 
and partly because of the absence of 
reliable criteria. The outstanding ex- 
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amples of the use of the naive em- 
et strategy are found in the 
Po re especially in the scales 
‘aaa alifornia Psychological In- 
Li ry and the Adjective Check 
S compiled by Gough. These 
ae give unit weight rather 
Sa eta weights to the tests, i.e., 
ees predictions to be made 
a e complex behavioral criteria, 
SS tolerance, delinquency, 
ae mic achievement, neuroderma- 
at oie social status (Gough, 
1985), ished bibliography, IPAR, 
ae naive empirical strategy has 
of fore ton Si over other strategies 
eo seaipent! and also of enabling 
Bade, “4 to predict complex and little 
ae stood behavior. y But it also 
Bel erious limitations: it can be used 
Led where suitable criteria groups 
Ee cea for validation and cross- 
“d € ation, and the validities may 
ponis owing to changes in signif- 
ant aspects of the conditions—tem- 
ee geographic, public attitudes 
ne information, set of the subjects, 
he Either some understanding of 
Ted is cia i theoretical factors 1s 
rae oh to provide a warning sys- 
1 against “drift,” or constant re- 
validation must be carried out. 
ee primary purpose served by the 
hee empirical approach is that of 
APSE ERAB) and validating assess- 
pe t instruments, although the long- 
we purpose can be both selection 
< research on personality- Up to 
RAN the personality research aim 
a e served simultaneously with 
abe rae aim, since the dis- 
eo of the intercorrelations be- 
aa the tests themselves and the 
CEA suggest personality coh 
vier s. But we are now back on the 
alidai of priorities: we can use 
Maea 10n studies for personality re- 
Bese only if we already possess 
ulates about the significance for 


personality of the behavior tapped by 
the tests and the criteria. 

In this reference we should briefly 
consider the sources of the test items 
that are used in the validation “try- 
outs.” The sources may be naive 
empirical, or they may be theoretical. 
Empirical sources include: tests in 
the general area that are traditionally 
used, those that are readily available 
and can conveniently be given, tests 
whose title or item content bear a 
superficial relationship to the cri- 
terion, and tests which have pre- 
viously been shown to relate to the 
criterion. Theoretical sources of 
tests, on the other hand, include the 
systematic or unsystematic sampling 
—usually the latter—of the areas of 
personality that are considered by the 
researcher to be relevant to the cri- 
terion behavior. The empirical out- 
look of the student who is developing 
personality assessment techniques is 
seldom so naive that it is entirely 
uninformed by theoretical considera- 
tions, so that the “naive empirical” 
chin practice tends to become 


ing structures 
and thus to approach the analytic 
ibed below- 


strategy describe The in- 
tervening structures, however, are 
not made explicit in this empirica 
approach. 

2. Global. This is the second non- 
mediated strategy, in which the as- 
sessor relies on his intuition, €m- 
pathy, and verständnis processes to 
provide the predictions, rather than 
using statistically established ass 
ciations between assessmen 
and criteria. If any analys 
of the criterion in 
is directed at the social 
tions for t 
rather than 4 


approa 
mediated by interven 


is is made 
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sessors about the subject’s perform- 
ance on objective tests, and even con- 
cerning the validity of these tests— 
for example in the Menninger studies 
(Holt, 1958)—but the ultimate as- 
sessment is a global one. This proce- 
dure is the personality assessor’s 
answer to some of the drawbacks of 
the empirical strategy. Intuitive pre- 
dictions can be used when the asses- 
sors have only a vague concept of the 
criterion conditions, but empirical 
methods require clear-cut criteria and 
expendable samples of trial subjects 
who have been rated on these criteria. 
Where this is impossible, as in the 
case of the OSS studies, the empirical 
strategy cannot be used, and intui- 
tion must be resorted to. 

The distinction between empirical 
and global strategies also is analogous 
to the distinction between narrow- 
and wide-band techniques (Cron- 
bach & Gleser, 1957), the former en- 
abling comparatively more reliable 
but limited predictions, Supporters 
of the global strategy have claimed 

for it special adaptability to the 
vagaries of the conditions associated 
with both the assessment and cri- 
terion situations. Some writers also 
claim for it a special virtue in con- 
nection with personality research in 
that it avoids the violation of a 
“whole” person inherent in trait 
psychology; however, it is 
doubtful whether it is correct to use 
the word “research” to describe a 
mode of study which, if it were ap- 
plied in its pure form of global 
verständnis, would by definition pre- 
clude communication of the assess- 
ments. 

The value of the claims of the 
global strategists to have improved 
on empirical validation as a basis for 
selection programs is limited. Sub- 
jective methods of making predic- 
tions have seldom been shown to be 


very 


superior to objective methods where 
these are available, excepting in the 
case of especially competent assessors 
(see below). The relative competence 
of the assessors in making predictions 
about the subjects is analogous to the 
relative validity of the tests, and 
both can be established by the same 
type of validation techniques. In 
this way the empirical and the global 
strategies are similar in orientation: 
the proviso that incompetent as- 
sessors should either be eliminated 
from the assessment panel or trained 
to eliminate errors is analogous to the 
dropping or amending of an invalid 
test in the empirical strategy. 

The “nonanalytic” techniques used 
in the global strategy are not neces- 
sarily nonmediated by personality 
constructs, even though these con- 
structs may not be made explicit. 
The process of moving from observa- 
tions of behavior to inferences about 
future behavior uses a set of postu- 
lates about personality and various 
derived premises; these premises in- 
volve certain personality constructs 
or categories into which the assessor 
places the behavior of the subjects, 
ie., he “instantiates” the behavior. 
Intuitive inferences, even empathic 
ones, can be reduced to this formula- 
tion which provides a bridge between 
analytic and nonanalytic processes. 
This point is elaborated by Sarbin, 
Taft, and Bailey (in press). 

3. Analytic. The analytic strategy 
makes explicit the role of mediating 
Constructs in prediction. A two- 
Stage inference is involved; first, 
there is an inference from the cri- 
terion requirement to the traits that 
are relevant to that performance (the 
“criterion analysis”); and, secondly, 
an inference from the subject’s ob- 
served behavior and test performance 
to his status on the trait dimensions 
(the assessment). Research on the 


MULTIPLE METHODS OF PERSONALITY ASSESSMENT 


validity of these inferences requires 
two separate studies: one of the 
validity of the analysis of the cri- 
terion requirements and the criterion 
indices, and one of the validity of the 
tests as predictors of the criterion. 
These validation studies should be 
based on independent samples of be- 
havior and, for preference, on inde- 
pendent samples of subjects, the re- 
search on the criterion analysis to 
precede the validation of the in- 
struments. 

The importance of criterion anal- 
ysis was recognized in each one of 
the “milestone” assessments, but the 
validity of the analysis is usually as- 
sumed. Two types of approach to 
the criterion analysis problem have 
been used: intuitive and empirical. 
The intuitive approach is the one 
usually used in personality assess- 
ment; typically the assessors have 
used either the testimony of ‘‘ex- 
perts” or their own theoretical anal- 
ysis to determine the criterion re- 
quirements. These analyses rest on a 
theory of personality, but the theory 
is usually not make explicit, nor is it 
subjected to empirical validation. 

The empirical approach to criterion 
analysis can employ subjective or ob- 
jective methods. The Menninger 
studies, for example, employed sub- 
jective rating methods to compare 
the characteristics of successful and 
unsuccessful psychiatrists.* The VA 
assessment program was, among 
other things, one big empirical cri- 
terion analysis using both subjective 
and objective methods. The study 
began with no explicit analysis of the 


„t Knowledge of the results of this analysis 
did not improve the validity of the assessor's 
predictions, but this could have been caused 
by the assessors preferring to use a global 
rather than analytic strategy despite the 
analytic information which was supplied 
(Holt, 1958). 
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requirements in clinical psychology 
and ended with an explicit descrip- 
tion of some of the characteristics 
which relate to success in various as- 
pects of that profession. In a sense, 
all preliminary validation try-outs of 
tests in a naive empirical strategy, 
such as those used in the VA and 
IPAR programs, constitute a cri- 
terion analysis. The cross-validation 
that follows may thus be regarded 
as testing a series of hypotheses about 
the criterion behavior. Referring 
once more to Cronbach and Meehl’s 
contribution (1955), we see now that 
the analytic strategy is a type of con- 
struct validation which attempts to 
augment the “nomological net” sur- 
rounding the relevant constructs. 
The main difficulty with the ana- 
lytic method of assessment is that it 
requires a set of constructs which 
may not exist in our present state of 
psychological knowledge—although 
the assessment results may contrib- 
ute to the development of such con- 
structs. The difficulties which factor 
analysts often encounter in their at- 
tempts to label their factors leads one 
to sympathize with Cattell’s pref- 
erence for using reference letters and 
numbers rather than trying to fin 
meaningful labels for his personality 
factors (Cattell, 1957). The analytic 
method, then, is limited by the cur- 
rent state of development of per- 
A further drawback 
of a thoroughgoing analytic method 
of assessment is the practical con- 
{ effort; the 
returns may be just as great, prob- 
ably greater, in the first pilot assess- 
ments in a program, i 


empirical or global strategy without 


trying to make explicit the underly- 
ing theoretical relationships. In addi- 
tion, analytic assessments require a 
double inference and consequently 


the possibilities of error are increased; 
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either the criterion analysis or the 
ratings of the candidate might be in 
errors. In the analytic strategy, how- 
ever, there is at least the hope that 
the sources of these errors will be 
discovered and corrected, whereas 
the sources are masked in the non- 
mediated strategies. 
The analytic strategy is applicable 
to any of the three purposes, selec- 
tion, validation research or personal- 
ity research, but its greatest poten- 
tial is in the latter; in fact, if the re- 
sults of assessment are to be of any 
value in increasing our understand- 
ing of personality, it is essential that 
the data be expressed in terms of 
basic personality constructs under- 
lying the subject’s behavior so that 
the scores and observations on the 
subjects may become meaningful. 
This applies both to naive empirical 
strategies such as factor analysis or 
blind item validation, and to global 
strategies in which the mediating 
constructs are not made explicit. 
o sum up: we have argued that 
both the naive empirical and the 
global strategies are actually medi- 
ated by analytic personality con- 
Structs, but that it is not always 
necessary, or even possible to make 
those mediating variables explicit. 
This may apply both when the pur- 
pose of the assessment is validation 
of the techniques or the carrying out 
of an actual selection, But when the 
purpose is personality research, some 
explicit handling of the constructs is 
advisable. The concept of construct 
validation supports this requirement 
by merging the validation and the 
personality research orientations. 
Each of the three strategies has its 
particular uses in assessment pro- 
grams. Where mass screening is re- 
quired, the empirical strategy is usu- 
ally best, if possible; where the cri- 
terion situation is complex and un- 


repeatable, but familiar to the asses- 
sors, the global approach is to be 
preferred, and where the relevant 
personality theory has attained a suf- 
ficient level of development, the 
analytic strategy is indicated. Where 
none of the basic requirements are 
present—a repeatable and reliable 
criterion, familiarity of the criterion 
to the assessors, or appropriate per- 
sonality theory—the assessors have 
to choose the strategy that scems 
best, although no strategy can really 
redeem such a hopeless situation. In 
general, personality assessors being 
what they are, they will prefer a 
largely intuitive approach, either 
analytic or global, as they did in the 
WOSB and OSS situations, but an 
increasing respect seems to be paid 
to the need for illuminating these in- 
tuitive methods by empirical anal- 
ysis wherever possible. 


Some SPECIFIC ISSUES IN ASSESSMENT 
AS A METHOD OF PREDICTING 


Clinical Versus Statistical A bproaches 


We have argued that there are oc- 
casions when intuitive methods of 
making predictions, i.e, “clinical” 
have their appropriate place. Statis- 
tical methods cannot be used where 
no prediction formula exists. But 
some personality assessors speak as 
if the clinical method is always to be 
preferred as it enables the assessor to 
be flexible in his use of the data in a 
way that is not possible with statis- 
tical techniques; for example, the 
clinician can give weight to obvious 
but rare and nonrepeatable factors in 
the subject’s current situation which 
could not be validated empirically. 

ther advantages claimed for the 
clinical against the statistical ap- 
proach are that it does not violate 
the essential unity of the subject's 
personality, and that it enables the 


a 
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use of empathy and recipathy in 
making the predictions. (Actually 
these subjective clues could also be 
used as data by the statistician along 
with other more objective data.) 

; Other assessors regard clinical tech- 
niques as only a last resort. A num- 
ber of advantages can be quoted for 
statistical prediction over clinical, 
most of which boil down to the fact 
that the statistician has a far more 
efficient memory and a larger atten- 
tion span than the clinician; he can 
i remember” the relevant data at the 
appropriate time and combine them 
with other data in order to obtain 
optimal weightings for future pre- 
dications. 

And so we have, on the one hand, 
the efficient but rigid and inhuman 
statistical prediction, and on the 
other, the flexible and humane but 
inefficient clinical. Which one is more 
Pee in personality assessment? 

here are several discussions of this 
question available (e-8+ Cronbach, 
1956; Meehl, 1954; Holt, 1958; 
Sarbin, Taft, and Bailey, in press; 
McArthur, 1954) so the points will 
not be elaborated fully here except- 
ing in so far as they directly affect 
multiple personality assessment pro- 
cedures. 

So weight of the evidence clearly 
upports the accuracy of the statis- 
ee approach compared with the 
Seka Meehl’s notorious score 
oard (1954) recording the relative 
validity of clinical versus statistical 
Buon mounts grim evidence in 
eae of the latter. Holt (41958) crit- 
cizes Meehl’s summary ©” the 
grounds that most of the studies 
Hise sophisticated, actuarial pre; 
ise 3 against ‘naive clinical, 
ek others (e.g., Wittman’s) ac- 
“sophi showed the superiority © 
ical istical clinical” over natve clin- 

methods. But Meehl is quite 


clear about the rules of his contest: 
the rival methods start off with ap- 
proximately the same objective and 
subjective data, although in some of 
the studies the clinician used addi- 
tional subjective data. The impor- 
tant difference is that the reported 
statistical predictions were based on 
the naive empirical method of valida- 
tion, while the clinical were either 
global or intuitively analytic. The 
statistical approaches were not con- 
cerned with the meaning of the cor- 
relations between the data and the 
criteria, although the use of cross- 
validation and statistical refinement 
meant that the empirical procedures 
were not as naive as it appeared, nor 
were they always uninformed by in- 
tervening personality constructs. 
Holt pleads for the use of “sophis- 
ticated clinical methods,” by which 
he means something similar to our 
using intuition to 
make the final predictions. Among 
other things, he wants the clinicians 


to make preliminary studies of the 


criterion behavior, in order to analyze 
the requirements for success. Holt 
does not take the step of requiring 
validation of the individual clini- 


cians; but this is necessary to match 
test. He 


fully the two sides in the con ] 
reports that the best judges, using 
global clinical techniques, reached 
prediction validities of up tO WOT: 
whereas statistical treatment of the 
tests—regular Rorschach scoring 
(validated and cross-validated) and 
the Strong Interest Psychiatrist key 
—resulted in virtually zero validities. 
But Holt’s contes i E 
statistical side. His experiment was 
a half-hearted affair; nO. 
made to develop 


would be appropriate to e 
problem at hand, as was done in 
VA and the IPAR studies, 2 d 
Holt’s ow? admission the Strong key 
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was validated a long time previously 
in an entirely different situation. 
Holt’s report, as he himself indicates, 
does not provide us with a fair con- 
test between sophisticated clinical 
and sophistical statistical ap- 
proaches. 5 ae 

In recommending a sophisticated 
clinical approach, Holt argues that 
“there simply is no substitute for 
empirical study of the actual asso- 
ciation between a type of predictive 
data and the criterion” (1958, p. 3). 
Despite this, the evidence that he 
presents on the value of objective 
criterion analysis for the assessor 
(i.e., validation) is not promising. 
The assessors at Menninger were 
provided with “manuals” embodying 
validation material on the interview, 
TAT, Rorschach, and other assess- 
ment techniques that had been used 
in an earlier assessment of psychia- 
trists at Menninger, Holt’s conclu- 
sion about their value reads as fol- 
lows: “Of the six, two proved worth- 
less... ; the other four all showed 
more or less promise, but there was 
none that yielded consistently signifi- 
cant validities regardless of who used 
wt” (1958, p. 8, italics ours). Evi- 
dently, the assessors would not, or 
could not, use the validation data 
which were provided for them. 

We are thus reminded 
that validation includes y 
the specific assessors carr 
specific assessment task, 
reports of personality a 
offer evidence that the assessors dif- 
fer considerably in their predictive 
skill. These differences are made up 
of two types of variation; variation 
due to differences in general ability 
to judge people (Taft, 1955), and in- 
teraction effects between the assessor 
and the type of judgment called for 
(Crow & Hammond, 1957). The re- 

ports on assessments offer the hint 


once more 
alidation of 
ying out the 
Nearly all 
ssessments 


that the highest validities are 
achieved by assessors who have the 
most familiarity with the criterion 
situations and with the type of per- 
son who is successful in those situa- 
tions; for example, in the CISSB 
assessments, the Board of Review 
consisting of experienced civil service 
administrators made more accurate 
predictions than did the original 
CISSB selection committee. In the 
former, the most valid predictions 
were made by the chairmen who were 
also civil service administrators. 
Accurate assessments are most 
likely to occur where the assessor 
uses the in-group stereotypes which 
are also held by the criterion raters; 
they are able to “play their predic- 
tions by ear” without any need to 
make the double inference involved 
in analytic techniques. In support of 
this method of predicting we can 
quote the comparatively high validi- 
ties found for ratings of the “like- 
ableness” of the candidates in the 


Michigan, Menninger, and IPAR 
assessments. For example, in the 
latter, the 


assessors were mainly uni- 
versity professors, and it is therefore 
not surprising that their ratings of 
“personal soundness” correlated as 
high as 0.52 with ratings made of 
the candidates on this quality by 
their own departmental professors 
(Barron, 1954). All other things be- 
ing equal, the best assessors for pre- 
dicting existing criteria are those who 
are partially contaminated with the 
Same experience, standards, and out- 
look as the criterion raters and can 
thus rely ona global strategy to make 
their predictions. (The most ac- 
curate assessors are also more ac- 


curate than the most accurate, cross- 
validated, tests.) 


y The validity of an 
is subject to the accu 
sonality theory whi 


alytic methods 
racy of the per- 
ch the assessor 


a 
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uses, but psychologists usually pos- 
sess fairly stable postulates, based 
on the lore of their discipline rather 
than behavior-oriented empirical re- 
search, and these are not readily 
changed in the light of actual em- 
pirical data. This is probably the 
reason why some of the Menninger 
assessors did not improve their ac- 
curacy with the help of the em- 
pirically-based “manual.” The dif- 
ficulty can be seen clearly if we con- 
sider the findings of the Minnesota 
starvation studies (Kjenaas & Bro- 
zek, 1952) that the Rorschach in- 
dices of adjustment had a negative 
validity in predicting ratings of the 
subject’s adjustment after starva- 
tion. Could a typical clinical psychol- 
ogist bring himself to reverse com- 
pletely his normal interpretation of 
the Rorschach in order to predict the 
subject’s adjustment under the cri- 
terion conditions? Not unless he 
were able to find an intervening vari- 
able between the Rorschach and the 
criterion that would enable him to 
understand the connection within the 
framework of his existing theory of 
personality. 

Our discussion of clinical versus 
statistical methods of assessment has 
concentrated on one aspect of the 
procedure, the prediction-making 
stage. The contrast between these 
two approaches can be made in con- 
nection with a whole chain of deci- 
sions that must be made in the 
course of assessment: these decisions 
include determining the acceptable 
criteria, scoring the criterion be- 
havior, conducting the criterion anal- 
ysis, determining the form of the 
tests and standard situations, observ- 
ing and classifying the assessment 
behavior (i.e., scoring), combining 
the observations made by any single 
assessor into an assessment or pre- 
diction and combining the predic- 


tions made by different assessors. 
For example, should the individual 
assessments be combined subjec- 
tively by the chairman of an assess- 
ment board, by voting, or by averag- 
ing the individual predictions? In- 
sufficient attention has been given to 
the relative merits of subjective and 
objective methods at each one of 
these stages. 

The choice of method will depend 
on both the requirements and the 
over-all situation, including, some- 
times, public relations considerations. 
The final selection of assessment 
techniques is likely to be a mixture 
of both subjective and objective, but 
the circumstances that will favor 
one or the other at any stage are 
rather vague, and the choice is 
usually made on subjective grounds, 
although it, too, could be made on 
the basis of objective, empirical in- 
vestigation. In general, objective 
methods are to be preferred as far 
as possible as they maximize accu- 
racy, but practical considerations of 
economy, convenience, and the limi- 
tations of the situation, dictate the 
wholesale use of subjective methods 
in personality assessment. These 
subjective methods may have high 
validity under favorable circum- 
stances, and where the assessors are 
familiar with the criterion situation, 
clinical judgments may actually be 
more accurate than any objective 
methods are ever likely to be in 
predicting to criteria. 


Conditional Variables in the Criterion 


An old problem in evaluating the 
validity of prediction is set by varia- 
tions in the criterion situation attrib- 
utable to the surrounding conditions. 
For example, @ prediction that a 
candidate will make a good officer 
may be invalidated through some 
contingency such as being posted to 
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a commanding officer with whom he 
is incompatible. But these condi- 
tional factors do not stand on their 
own; there is an interaction between 
the person and the condition. Thus, 
Officer A may have the type of per- 
sonality (or background) that makes 
it likely that he will be posted to a 
commanding officer with whom he 
will be incompatible; if, for instance, 
Candidate A is Jewish, he is more 
likely to have a CO who behaves un- 
congenially than is another candidate 
of a similar personality who is not 
Jewish. Further, Officer A may per- 
form his duties better than otherwise 
when he has an uncongenial CO, 
while Officer B may perform his 
duties worse under the same circum- 
stances. In most assessments, no 
specific reference is made to such con- 
ditional factors and there is an im- 
plicit assumption of “given normal 
conditions” attached to the predic- 
tions. The OSS reports a validity of 
only 0.19 for all cases from Station S 
compared with 0.39 for only the cases 
who were given assignments that 
were consistent with the ones for 
which they were assessed. 

A further condition that is often 
ignored in assessment is that of 
effluxion of time; the predictions are 
usually made on the assumption that 
the status of the candidate on the 
relevant variables will remain con- 
stant over time. At a more sophis- 
ticated level, trends towards change 
may be observed in the candidate 
together with potential but as yet un- 
realized capacities, and the assess- 
ment may extrapolate these into the 
future. But it is virtually impossible 
to take into account subsequent 
learning, maturation and deteriora- 
tion in the assessment prediction, 

In this connection, Cronbach and 
Gleser (1957) have proposed a useful 
distinction between fixed treatment 
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(the same conditions for all success- 
ful candidates) and adaptive treat- 
ments varying according to the can- 
didate. Evidently the treatment of 
the OSS selectees was fixed rather 
than adaptive, and the predictions 
should have taken this into account. 
Five different types of solutions are 
suggested below for the problem of 
conditional factors in these treat- 
ments, Solutions 1 and 3 being par- 
ticularly appropriate to fixed treat- 
ments, and the other three to adap- 
tive. (These represent an expansion 


of the three solutions proposed in 
Horst, 1941, ch. 5.) 


1. Adjust the criterion ratings ex post facto 
according to the ease or difficulty presented to 
the candidate by the criterion conditions and 
the effects of these conditions on him over the 
relevant period of time This adjustment re- 
quires an intuitive judgment that takes into 
account the interaction between the condi- 
tions and the candidate, and this can be done 
only by the rater making a further, independ- 
ent assessment of the candidate. For the 
validation to carry conviction, it is necessary 
that the adjustment to the criterion rating be 
made independently of the assessment. 

2. Make the predictions to the ideal pos- 
sible conditions so that they represent the 
candidate's fullest potential; the criterion 
ratings can then be made in accordance with 
the same standards. In other words, both 
assessment and prediction attempt to hold 
conditions constant in the form that is con- 
sidered to be optimal for the candidate's per- 
formance. The actual conditions applying 
at the time of assessment and during the cri- 
terion performance are unlikely to be optimal 
no matter how hard this state is sought, so 


that the use of this solution rests very heavily 
on intuition, 


3. Predict to the avi 
tions that have prevail 
spect to thi 
expected ti 
usual orier 
Pirical validation since the cor 
which the validities are based are in effect 


averages. The empirical strategy automati- 
cally takes into account the variati 
ditions as well i 


maximizes the 
conditions. Thi 
the assessment 


erage or modal condi- 
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the criterion performance, without regard to 
their specific nature. It is practically impos- 
sible for a clinician to average all possible 
relevant conditions by an intuitive act, al- 
though it is common for a clinician to bear in 
mind the modal conditions which candidates 
face when these are prominent. 

4. The future conditions may be predicted 

specifically for each candidate so that the in- 
teraction between the candidate and the cri- 
terion conditions may be anticipated in the 
assessment. The prediction to the future 
conditions may be made on the basis of inside 
knowledge of the treatment to be given to the 
candidates in the criterion situation, or by 
forecasting on general grounds the specific 
changes that will occur in the conditions be- 
fore the criterion ratings are made. Such pre- 
dictions must be intuitive rather than em- 
pirical, and, by the nature of the complexity 
of man’s environment, all such intuitive pre- 
dictions must fall well short of perfect valid- 
ity. In some complex situations, in which the 
criterion performance is highly dependent on 
the conditions, the inability of the assessors 
to predict the specific conditions that will 
operate for any particular candidate may 
render the assessments completely invalid. 
_ 3: The predictions themselves can be made 
in terms of specific conditions: “if X condi- 
tions occur, then the candidate will be success- 
ful.” In this endeavor, the recent proposal by 
Cattell (1957, pp. 426ff.) for a taxonomy of 
situations might eventually supply a list of 
standard situations to be considered in condi- 
tional predictions. 


Solutions 4 and 5 are both specific 
conditional solutions which can take 
into account the effects of conditions 
that are external to the candidate, as 
well as intrinsic conditions such as 
maturation. They require both a 
knowledge of the criteria require- 
ments and a correct assessment of 
the candidate, but the first type of 
conditional prediction emphasizes the 
criterion situation, and the second, 
the candidate. Both of these latter 
methods of meeting the problem 
of conditions are adaptable to tak- 
ing into account multiple condi- 
tions and also “adaptive treatments” 
such as provisions for training that 
are tailor-made for the candidate. 
They hold out the possibility of mak- 


ing more exact predictions than can 
be made by the other three attempted 
solutions to the problem. This is one 
of the reasons why the global strat- 
egy, or slightly analytic versions of 
it, have been so often favored in se- 
lection assessment programs. But 
these conditional predictions are also 
the most difficult to make, and only 
the best judges of personality or the 
ones who are most experienced with 
the criteria conditions are able to 
make them accurately, and then only 
in appropriate situations. 

The decision as to the appropriate 
solution to the problem of varying 
conditions is closely related to the 
choice of strategy. In the long run 
the choice is one between elegance 
and the practical limitations that are 
imposed on the possibilities of accu- 


racy. 


The Assumption of Safety in Numbers 


Personality assessment programs 
rely on numbers to improve their 
validity in two directions: multiple 
tests and multiple assessments. We 
shall treat the evidence concerning 
these two points separately. 

Multiple tests. Where the tests and 
other assessment measures are com- 
bined objectively, for example, in 
accordance with a multiple regression 
equation, even the most valid test 
can usually be improved upon by 
adding one or two further measures 
to it. It is often striking, however, 
how quickly the multiple Rs reach 
their ceiling; the common compo- 
nents of almost all available person- 
ality measures seem to be so hig! 
that we quickly exhaust the new ele- 
ments that additional tests can bring 
to the predictions. The same applies 
when the combining of elements 1s 
carried out intuitively; even though 
the clinician may believe that the 
pieces of information about a candi- 
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date are independent of each other. 
It is doubtful whether a clinician can 
use more than a few pieces of data 
that are relatively independent, even 
if they can be found. Sarbin (1942), 
for instance, demonstrated that clini- 
cians who were given a mass of data 
from which to predict the success of 
university students, gave most of the 
weight to two variables only. 

Evidently, to give a clinician more 

than two or three pieces of data about 
an assessee is likely to be of little 
value. Some critics go even further, 
claiming that giving extra data ac- 
tually reduces validities by confusing 
the allocation of subjective weights 
to the predictor variables, and by in- 
creasing the variability of the predic- 
tions, i.e., inducing the clinician to 
venture into making extreme judg- 
ments which increase the risk of mak- 
ing large errors. Kelly and Fiske 
claim (1950) that in the Michigan 
study validities declined as more 
data were given to the assessors. Holt 
challenges the accuracy of their inter- 
pretation of the findings (1958, p. 8), 
but even so there are other studies 
that suggest that more data do not al- 
ways improve accuracy (e.g., Gage, 
1953; Giedt, 1955; Kostlan, 1954; 
Soskin, 1954). In Giedt’s study, for 
instance, the clinicians were able to 
make more valid Predictions of 
mental patient's personalities from 
sound recordings than from sound 
movies. 

But there are several studies af- 
firming that, at least under some cir- 
cumstances, more data do enable 
clinicians to improve their accuracy, 
We have already referred to Vernon's 
report (1950) that in the CISSB selec- 
tions the Board of Review was able to 
improve on the assessment board’s 
recommendations by combining these 
with their own interview impressions 
of the candidate. Increased validity 
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with increased data is also reported 
for the California (Mackinnon, 1951), 
Chicago (Stern, Stein, & Bloom 
1956), and Menninger (Holt, 1958) 
assessments. 

We must suspend our verdict on 
the value of multiple data at this 
stage. Evidently there are circum- 
stances that can overcome the limita- 
tions on the ability of a single assessor 
to hold in mind data and to combine 
them. One suggestion worth testing 
is to combine data into subdecisions 
of an increasing degree of molarity, 
until the final molar decision is 
reached. This procedure can assist 
the clinician to consider all of the 
data in reaching his final decision 
and it is analogous to the use of 
structured schedules and rating forms 
that are used by interviewers to con- 
solidate portions of the data as they 
go along. This technique as a general 
aid to clinical judgments seems worth 
experimenting with, although the 
danger must be avoided of giving too 
much weight to the data that are 
Presented first. In this respect it 
would seem to be wise to seek out 
first the data that are believed to be 
the most valid, 

Another way of handling the com- 
bining of data is to use several asses- 
sors, each responsible for one or two 
different techniques or areas of per- 
sonality. This was the method 
adopted, for example, in the CISSB 
assessments. This proposal carries 
Over to the general question of using 
multiple assessors and we shall con- 
sider it further below. 

Multiple assessors. The practice of 
using more than one assessor in selec- 
tion work is an old one; the assump- 
tion has been that the more assessors 
there are, the more ideas will be 
thrown into the pool and therefore 
the more thorough will be the mar- 
shalling of data. Where ratings are 
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pooled, it is also hoped that errors 
will cancel each other out. Very little 
experimental material is available on 
the relative value of group versus in- 
dividual judgments in personality 
assessments, but evidence can be 
used from other work on other types 
of group performance (see Kelly & 
Thibaut, 1954; Klein, 1956, ch. alts 
Argyle, 1957, ch. 5). 

These findings suggest, among 
other things, that accuracy of judg- 
ments increases with the size of the 
group, but the optimum number in 
informal problem-solving groups is 
possibly five, since larger groups re- 
quire formal structuring in order to 
ensure adequate communication of 
information; that compatible mem- 
bership is important in problem-solv- 
ing committees; that democratic 
groups produce more different ideas 
than individuals but fewer per per- 
son; that the quality of group deci- 
sions increases with an increase in the 
skill of the members; that groups are 
quicker at solving problems than in- 
dividuals, although less economical 
in terms of man-minutes. However, 
these findings vary according to the 
type of task concerned, and before 
we can carry them over to personality 
assessment it is necessary to bear the 
type of task in mind. 

Some of the questions that should 
be asked concerning group factors in 
personality assessment are: are group 
ratings more accurate than those of 
the individual members of the group; 
does group discussion by the assessors 
improve accuracy over pooled indi- 
vidual ratings; what is the relative 
value of means, modes, and medians 
as methods of pooling; the ideal size 
of committees; committee ratings 
versus averaging; authoritarian lead- 
ership of assessment committees ver- 
sus democratic; should all of the com- 
mittee members be given the same 


data; should both the observations 
and the interpretation be made by 
groups? These questions can be con- 
sidered at three points in the assess- 
ment procedures: (a) in making sub- 


_jective observations of the subjects; 


(b) in eliciting data from the subjects; 
and (c) in integrating and interpret- 
ing the data, and making the deci- 
sion. 

(a) At the observational level, we 
should expect that the pooled ratings 
of several observers would be more 
accurate than individual ratings, 
since pooling reduces the error vari- 
ance, provided always that the indi- 
vidual judgments have some validity 
in the first place (cf. Kelley & Thi- 
baut, 1954, p. 739). 

(b) The value of the group inter- 
view versus individual interviews as a 
means of eliciting data is equivocal 
(see the discusssion in Oldfield, 1947). 
A recent study (Glaser, Schwarz, & 
Flanagan, 1958) on the selection of 
supervisors found that individual rat- 
ings based on group interviews by a 
panel of three were no more accurate 
than the ratings made by one inter- 
viewer per candidate. While it is true 
that the group situation may elicit a 
wider sample of behavior than an in- 
dividual interview, it is more difficult 
for an interviewer to evaluate the sig- 
nificance of the group as the stimulus 
to which the candidate is responding. 
However, if the interviewing is con- 
ducted by the chairmen only, while 
the other assessors are simply ob- 
servers, this may enable the assessors 
to make more unbiased judgments 
than when they are actually involved 
in the interviewing. This effect still 
remains to be tested empirically. 

(c) The usual consideration of the 
value of group assessment versus in- 
djvidual deals with the integration of 
the available data. As in the case of 
group observations, pooled predic- 
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tions are more accurate than most or 
all of the individual predictions 
(Klugman, 1947; Luborsky & Holt, 
1957; Travers, 1941). In one study 
(Smith, 1932) of assessing the qual- 
ities of a child on the basis of behav- 
ioral data, the accuracy increased 
with an increase up to 50 of the num- 
ber of assessors whose ratings were 
pooled (there were only 50 assessors 
available). f 

Does discussion prior to assess- 

ment increase accuracy? The evi- 
dence on this suggests that it does not 
(Rusmore, 1944; Taylor, 1947; Kelly 
& Fiske, 1951; Oldfield, 1947). As 
Oldfield puts it: “Discussion of the 
merits of candidates merely amounts 
to a somewhat clumsy method of 
averaging the individual judgments 
of the members” (1947, p. 129). 
Whether discussion aids accuracy or 
not appears to depend on the quality 
of the persons who dominate the dis- 
cussion either through their position 
in the group, their personality, or 
their professional standing. Discus- 
sion is justified particularly when 
there is an “expert” as chairman, who 
will actually make the final decision 
in an autocratic manner, but who 
calls on the other members of the 
panel to give him the benefit of their 
opinions. An “expert” is defined, for 
this purpose, as a person who is ex- 
perienced both in assessment and in 
the criterion situations, 

Kelly and Fiske are quite pessimis- 
tic regarding the use of multiple as- 
sessors. “Until some of the major 
sources of error in predictions are 
eliminated, the replications of asses- 
sors and the use of staff conferences 
hardly seems justified for this type of 
prediction” (1951, p. 178). This con- 
clusion is too sweeping. As we can 
see from Table 1, both pooled and 
committee (discussion) ratings have 
justified themselves in some studies. 


Let us conclude this section ọn the 
“safety in numbers” assumption 
with a proposal to combine the ad- 
vantages of both multiple techniques 
and multiple assessors. The sugges- 
tion is that each assessor be given a 
limited amount of information on 
which to base his assessment judg- 
ments about the candidates, each 
assessor to receive different informa- 
tion. The assessments will then be 
pooled arithmetically. The informa- 
tion supplied may be objective or 
subjective, atomistic or molar, and 
may range from one item of life- 
history, or a test result, to a projec- 
tive test protocol, an interview or the 
observation of behavior in a minia- 
ture situation. This procedure would 
enable a vast amount of data to be 
integrated without problems of 
weighting since unit weights for each 
assessor’s contribution would be ade- 
quate—this would be analogous to an 
inventory that gives unit weight to 
each item. With adequate organiza- 
tion of the assessment program, this 
would permit several assessors to 
contribute to the final assessment so 
that different viewpoints and per- 
sonality theories can be represented. 
This approach seems to be at least 
worth experimenting with. f 

Even if it is found that increased 
numbers of assessors increases per- 
ceptibility the accuracy of the assess- 
ments, there is still a fine calculus of 
cost in human time and effort to be 
computed. The decision to augment 


th® panel with additional assessors _ 


is a function, among other things, of 
the gradient of diminishing returns, 
the ability of available extra asses- 
Sors, the cost of using them, the ef- 
fects on the candidates, and the desire 
to allow executives in the institution 
to participate in the assessment. The 
Proposal made above of having many 
assessors, who contribute small pieces 


Ee 
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information, may make it possible 
O conduct multiple assessments com- 
paratively cheaply. 


SUMMARY 


a personality assessment 
cats ures have been analyzed with 
oe a their primary purpose and 
E va idation strategy used. Prob- 
ms that arise in the attempt to use 
peeonality assessment for selection 
ae iscussed with respect to the 
etic of clinical versus statistical 
et Pk the problem of condi- 
mae actors that affect the criteria, 
a e value of using multiple tests 
more than one assessor. 
Some recommendations: 
bg Use objective techniques as far 
Fs possible for analyzing the criterion, 
scoring tests, and for making pre- 
ictions. 
a k 
2. Give careful consideration to re- 
quirements of the criterion and make 
empirical studies of the link between 
these requirements and both the test 
behavior and the criterion behavior. 
his is a step in construct validation. 
3. As a preliminary to the above 
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The word “projection” stems from 
the Latin verb projectus, meaning ‘‘to 
cast forward” (Bell, 1948). In the 
field of personality one would be hard 
pressed to find a concept so capable 
of multiple interpretation and so 
varied in meaning as the concept of 
projection. Surely, this concept has 
had more interpretations than the 
smile of Mona Lisa. Sears described 
the situation thus: ‘Probably the 
most inadequately defined term in all 
psychoanalytic theory is projection” 
(Sears, 1943, p. 121). Murray has 
said “If ‘projection’ means every- 
thing it means nothing” (Murray, 
1951, p. 13). 

Just how many kinds of projection 
are there? It is frankly impossible to 
describe them all. The best one may 
hope for is to group them into more 
or less distinct categories. This 
paper, then, will concern itself with 
attempting to describe various 
“types” of projection, their origins, 
and the research undertaken with re- 
gard to them. 

Only the various kinds of projec- 
tion subjected to experimental in- 
vestigation will be dealt with in de- 
tail. Because they have resulted in 
little research, the conceptual frame- 
works of Bellak (1950, 1954, 1956), 
Murray (1938, 1951), Van Lennep 
(1951, 1957), Rapaport (1942, 1945, 
1952), Schachtel (1950), and Goss 
(1957) will not be considered. 

, The research literature seems to 
indicate some four possible cate- 
gorizations of the concept of projec- 
tion: “classical,” “attributive,” “au- 


tistic,” and “rationalized” projec- 
tion. Under the category ‘‘miscel- 
laneous” we will review those studies 
in which projection was undefined, or 
in which several different concepts 
were simultaneously investigated. 


Classical Projection 


The concept of projection was 
known several centuries before the 
appearance of Freud. Thus Thomas 
a Kempis stated, “What a man is in- 
wardly that he will see outwardly” 
(Cattell, 1951). The Malleus Male- 
ficarum, written in the Middle Ages, 
gives a clear example of projection. 

For fancy or imagination is as it were the 
treasury of ideas received through the senses. 
And through this it happens that devils so 
stir up the inner perceptions, that is the power 
of conserving images, that they appear to be a 
new impression at that moment received from 
exterior things (Zilboorg, 1935, p. 54). 

At the close of the 19th century, 
Freud gave the following definition 
of projection: “The psyche develops 
the neurosis of anxiety when it feels 
itself unequal to the task of mastering 
(sexual) excitation arising endoge- 
nously. That is to say it acts asif it 
had projected this excitation into the 
outer world” (Bellak, 1944, p. 353). 
With this definition, the use of the 
concept soon became quite wide- 
spread. 


The view of “classical” projection 


most often held currently by many 
psychologists js: “A situation 1n 
which the eg° feels threatened is 
likely to result in the ego's refusing to 
acknowledge the trait and in the sub- 
353 
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equent attribution of ‘the trait to 
fie outside world” (Murstein, 1956, 
p. 418). Adherents to this view are 
many, though they differ as to the 
wording used. Only a representative 
sampling is listed (Healy, 1930; Hoff- 
man, 1935; Jelgersma, 1926; Kauf- 
man, 1934; Knight, 1940; Schafer, 
1954; Schaffer, 1945; Symonds, 1949; 
Warren, 1934). The possibility of the 
projection of objectively favorable 
traits has been mentioned by Muhl 
(1943), Hoop (1924), and Janet 
(1947). 


Attributive Projection 


“Attributive projection” has been 
described by many psychologists, in- 
cluding Cameron (1951). A recent 
definition is: “The ascribing of one’s 
own motivations, feelings, and be- 
havior to other persons” (Murstein, 
1957b). It is perhaps the most popu- 
lar of the uses of projection today in 
the field of personality. The con- 
cept’s popularity rests on its broad- 
ness; i.e., unlike “classical” projec- 
tion there is no concern with the S’s 
unconscious, or self-concept. It is 
often sufficient merely to note that 
there is a correlation between some 
characteristic of the subject and some 
statement or prediction he makes 
concerning other persons, Weiss 
points out that “the term projection 
in current usage, refers to every kind 
of externalization, Particularly to 
every process in which ideas, im- 
pulses, or qualities belonging to one- 
self are imputed to others” (1947, p. 
358). Nevertheless, the concept has 
strong support (Cameron: 1947, 1951; 
Dymond, 1950; Munn, 1946). Even 
Freud, in Totem and Taboo said (as 

noted earlier by Bellak, 1956): 


But projection was not created for the pur- 
pose of defense; it also occurs where there is no 
conflict. The projection outwards of internal 
perceptions is a primitive mechanism, to 
which, for instance, our sense perceptions are 


subject, and which therefore normally plays 
a very large part in determining the form 
taken by our external world. Under condi- 
tions whose nature has not yet been suffi- 
ciently established, internal perceptions of 
emotional and thought processes can be pro- 
jected outwards in the same way as sense per- 
ceptions, they are thus employed for building 
up the external world, though they should by 
rights remain part of the internal world 
(Freud, 1955, p. 64). 


Freud’s statements are, thus, 
amenable to an attributive definition 
of projection while comprehending 
projection as an organizing aspect of 
perception. 

Horney also described a nondefen- 
sive kind of projection which, in part, 
is “not essentially different from the 
tendency to assume naively that 
others feel or react in the same man- 
ner as we ourselves do” (1939, p. 26). 
An example of “naive” projection is 
mentioned by Baldwin (1955) in 
describing the child who in a fit of 
anger threatens to maim, murder, 
and demolish another child, and as a 
crowning imprecation, threatens not 


to invite his adversary to his birth- 
day party. 


Autistic Projection 


Perception which is strongly influ- 
enced by the needs of an individual, 
in that the figural aspects of the per- 
ceived object are modified so as to be 
consistent with the need, may be re- 
ferred to as “autistic” projection. 
Murphy (1947, pp. 338 ff.) wrote ‘‘so 
wherever our needs differ we literally 
see differently. Much of the process 
of individual perception depends 
upon the force of past wants, the per- 
son’s need to disentangle and restruc- 
ture in terms of the situations with 
which he has had to cope.” Sears 
(1943, p. 121) has said, “it may be 
said in general, that the presence of a 
need or drive provides the antecedent 
condition for the perception of ob- 
jects related to that need or drive.” 
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Rationalized Projection 


ge type of projection is similar 
‘classical ' projection in that the 
pcre process is held to be un- 
ie The projector, however, is 
Potion of his behavior. He at- 
aaia to justify it by inventing a 
r ionale. Thus, the person caught 
nE, on the “black market” says 
i jelfiustiica non “everybody else 
S oing it.” Here, the attempt is to 
poora neurotic anxiety about doing 
ee wrong into objective anxi- 
(M about not getting enough to eat 
A urstein: 1956, 1957b). Baldwin 
(1955) states: 
eee a child disobedient and unloving 
eerie resentment. This introduces us to 
man er sort of defense against guilt—namely 
be pany the rejection so that it no longer 
a es guilt feelings. This attribution of 
here to the people toward whom we feel 
racy is a defense mechanism called “projec 
ion” (Baldwin, 1955, p. 498)- 


„Among other supporters of this 
kind of projection are Allport (1939), 
(complementary projection), Fen- 
ichel (1945), The Psychiatric Dic- 
tionary (1940), and Piaget (1926), 
(projection de réciproque). Van Len- 
nep (1951) has referred to this oc- 
currence not as projection, but as its 
correlate. Thus, if a person is fright- 
Sete in order to project, he woul 

ave to see the object as frightened. If 
he sees the other as the cause of his 
fright and thus frightening, he is per- 
ceiving the correlate of his fright, but 
he is not projecting. Again, however, 
we find Freud in Totem and Taboo 
giving plausibility to @ rationalizing 
kind of projection. He states: 

Bee cannot be disputed th: i 
malcanat which turns a dead man into a 
diy Aei enemy, is able to find support in 

X ee of hostility on his part that may 
Bisa: ects felt as a grudge against 
EREA EARN his love of powe: 
ground ae ti atere else may form 
tionships (1988, the tenderest of human 

, p. 63). 


rela- 


At this point it may be of value to 
examine the experimental findings 
with regard to the aforementioned 


kinds of projection. 
EXPERIMENTAL FINDINGS 


Classical Projection 


Sears (1936) was among the first of 
the psychologists to utilize a quanti- 
tative index of projection. He studied 
male college students’ possession of 
the traits of “stinginess,” “obstin- 
acy,” “disorderliness,”” and ‘“‘bash- 
fulness,” by the pooled rating method. 
Projection was measured as follows: 

(1) The degree to which each individual 
demonstrated a given trait was determined by 
averaging the combined ratings assigned to 
him on that trait by other members of his 
house. (2) The amount of a given trait at- 
tributed to others was obtained by averaging 
individual assigned to the 


the ratings a given 1n 
other members (Sears, 1936, p. 153). 


Projection would have occurred if, 
for example, stingy, noninsightful 
persons saw more stinginess in others 
than did the total group as a whole. 
The results, however, did not support 
a Freudian (“classical”) concept of 
projection, in that projection 0C- 
curred for both acceptable and non- 
acceptable (reprehensible) traits. 
Moreover, in the group possessing in- 
sight, 4 negative correlation was 
found between the strength of the 
trait and the amount attributed to 
others. Sears called this occurrence 
“contrast formation.” Its effect was 
said to be opposite to that of projec- 


tion. 

One might have wished that “pro- 
jection” and. “insight” had not been 
treated dichotomously: 
ness of measure due to the lack of a 
specific quantitative score other than 

» or “did not project 

extensive anal- 
i the appearance of a 
ie 7 in the data, & fact 


spurious element c 
Ri by Rokeach (1945), and Calvin 
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and Holtzman (1953), places Sears’ 
findings in doubt. The difficulty lies 
in the fact that projection as defined 
by Sears was a function of the group 
rating (G), and the difference be- 
tween the group- and self-ratings 
(G-S). Since G appeared in both (G) 
and (G-S), a spurious correlation 
would be expected between (G) and 
(G-S), even if no actual psychologi- 
cal relationship existed. Hence, the 
effect of G should have been par- 
tialled out, something which was not 
done. 

Zucker (1952) measured projection 
by summing the number of items in 
which an individual (college student) 
said that other people behaved in a 
certain way, but he, the student, did 
not. The approach seems question- 
able. No mention was made of the 
actual possession by the Ss of the 
trait in question. Hence, deviates 
from the population norm would have 
been considered as projectors through 
the use of this methodology. Under 
these circumstances Zucker’s findings 
that “high projectors” had 
ideas of reference and were 
ascendant on the Allport Ascendant- 
Submissive scale than “low pro- 
jectors,” seems meaningless. 

Zimmer (1955) presented students 
with three photos and had them 
choose the ones liked best and least. 
The selected photos were then rated 
on a 7-point scale as to possession of 
some 25 traits. The Ss then rated 
themselves on a scale containing the 
same traits. Finally, measures of 
conflict with regard to each of these 
traits were obtained from a word 

association test. 

Zimmer's hypotheses, that accept- 
able personality characteristics are 
projected onto liked individuals, and 
unacceptable characteristics are pro- 
jected onto disliked persons, and that 
the strength of projection is a func- 
tion of the degree of conflict, were all 
substantiated. Similarly, Lundy and 


more 
more 


Berkowitz (1957) found that students 
whose attitudes were influenced by 
peers tended to perceive themselves 
as more similar to these peers than 
students who were negative to peer 
influence. 

Norman, in a series of experiments 
with co-workers Ainsworth (1954), 
and Leiding (1956), investigated the 
relationship of such variables as “pro- 
jection,” “empathy,” and “reality,” 
and found the correlations given in 
Table 1. 

These results apparently indicate 

that the various kinds of projection 
are detrimental to the accurate per- 
ception of others, but they have been 
shown to be an artifact of the pro- 
cedure by Cronbach (1955), Gage 
and Cronbach (1955), and Murstein 
(1957a, 1957b). The criticism by 
Murstein (1957a) of their procedure 
is based on their definitions of pro- 
jection, and is as follows: 
Norman and Ainsworth Projection: 
“A” says that he does not possess a 
Certain trait (he answers “no” to one 
of the questions on the GAMIN). 
“A” says that other students of his 
age and sex do possess that trait. Of 
the remaining members of the group 
(college students) from whence “A” 
stems, 51% or more say that they do 
not possess the trait in question. 

Norman and Leiding, Projection A: 

his measure of projection may be 
more readily understood by means of 
an alphabetical shorthand used to 
describe each step (a, b, c, d,e ++). 

(a) “A” rates “B” as he thinks “B” would 

rate himself, 


(b) “A” rates himself (“A”) as he thinks 
“B” would rate him, 

(c) “A” rates “B” as he (“A") sees “B.” 

(d) “A” rates himself (“A”), 

Projection = (a—c) plus (d—b) 


Norman and Leiding, Projection B: 


(e) A says most other people 


will answer 
a given questi 


on in a certain way. 
(f) “A” answers in the same way in judging 

himself (“A”) as he Predicted that most 
other people would answer. A “pro- 


x 
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TABLE 1 
INTERCORRELATIONS OF “PROJECTION,” “EMPATHY,” AND “REALITY” TAKEN 
From THE NORMAN AND AINSWORTH (1954) AND 
NORMAN AND LEIDING (1956) STUDIES 


Authors Variables r 
Norman and Ainsworth Projection vs. Reality —.41 
Norman and Ainsworth Projection vs. Empathy —.65 
Norman and Leiding Projection A vs. Refined Empathy —.41 
Norman and Leiding Projection B vs. Raw Empathy 86 
Norman and Leiding Projection B vs. Refined Empathy —.69 


jection” point is achieved each time e 
and f occur together for the same ques- 
tion. Thus, Projection ==(e, f). 


The projection score of Norman 
and Ainsworth may be objected to 
(a) because of the use of the subject's 
self-rating as an objective criterion of 
whether or not that subject possessed 
a given trait. This appears hazard- 
ous, particularly in a study measur- 
ing projection. Since the criterion 
for the group was 51% or more, if 
only 2% of the Ss projected, the item 
would have been placed incorrectly 
and thus distorted the measure of 
projection; (b) because of the absence 
of any reliability coefficients for the 
various judgments; (c) because some 
of the traits on the GAMIN have 
moderately high intercorrelations, 
thus lending a degree of spuriousness 
to the results; (d) because in correlat- 
ing “Projection” and “Empathy” 
there is a common component (others 
say they possess the trait) in both 
variables which spuriously inflates 
the resulting correlation. 

Norman and Leiding used two 
other measures of projection which 
have been labeled by the present 
authors as Projection A and B. Using 
the aforementioned alphabetical 
shorthand for operations, their cor- 
relation of Projection with Refined 
Empathy (Raw Empathy minus Pro- 
jection), for which a correlation of 
—.47 was reported, may be described 
as follows: 

Projection vs. Refined Empathy (Raw 

[(a — c) + (d — b)] vs. [(a —g) +- 


Empa’ 


H= Ke- +- 


Similarly for Projection B, the 
correlation of Projection with Raw 
Empathy (r =.86) may be described 
as follows: 


Projection 


z(e, f) 


We have omitted the descriptions of 
g, h, and 7 in the interest of space 
since they are not crucial to our dis- 
cussion. 

In the first of these two correla- 
tions, the position of common com- 
ponents is such as to insure the fact 
that as the projection score increases, 
the refined empathy score must de- 
crease due to the common com- 
ponent ((a—c) +(d—0)].- Hence, a 
spurious negative correlation is quite 
expected. In the second of the cor- 
relations, there is again an occurrence 
of identical components, but, here, 
since no component is subtracted, 
the spurious correlation which re- 
sults is positive. 

Murstein (1956), by means of 
pooled ranks, selected four personal- 
ity groupings (hostile-insightful, hos- 
tile-noninsightful, friendly-insightful, 
and friendly-noninsightful). Using 
the Rorschach Hostility Scale as a 
measure of the hostile content of the 
Rorschach, he found that hostile, 
insightful, people pr ojected more hos- 
tility than any other grouping. Ina 
dynamic ego-threatening situation, 
however, the hostile, noninsightful, 


Raw Empathy 


vs. 
Ze, i) 


thy —Projection) 
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group manifested “classical projec- 
tion” as expected, but, surprisingly, 
the friendly, insightful, group also re- 
acted strongly to threat, by distort- 
ing the examiner’s behavior in per- 
ceiving him as hostile. Thus, the re- 
sults did not wholly jibe with a 
“classical” conception of projection 
but were more amenable to analysis 
in a phenomenological frame of refer- 
ence. R 

Both the hostile, noninsightful, 
and friendly, insightful, groups per- 
ceived themselves as friendly. Ob- 
jectively speaking, the members of 
the friendly, insightful, group were 
correct, the members of the hostile, 
noninsightful, group were in error. 
What was important, however, was 
the way in which each individual per- 
ceived himself. The experimental 

findings are consistent with the belief 
of Lecky, that “any value... which 
is inconsistent with the individual’s 
valuation of himself cannot be assim- 
ilated; it meets with resistance and is 
likely, unless a general reorganization 
takes place, to be rejected” (Lecky, 

1951, p. 153). 

In a quite similar design, Page and 

Markowitz (1956) obtained a de- 
fensive and nondefensive population 
from their responses to MMPI items. 
Again, a test of defensiveness based 
on receiving critical comments with 
regard to performance on a pseudo 
“intelligence test” did not signifi- 
cantly differentiate the defensive and 
nondefensive persons. Under per- 
sonal ego-threat, the defensive per- 
sons projected more hostility on the 
examiner than did the nondefensive 
persons. The nondefensive persons 
did, however, show a nonsignificant 
tendency toward projection. 

These two experiments seem to 
support the concept of “classical” 
projection in indicating that defen- 
sive persons do project under ego- 
threatening situations, though they 
may not do so on projective and 


BERNARD I. MURSTEIN AND RONALD S. PRYER 


paper-and-pencil tests. The failure 
of Page and Markowitz to find sig- 
nificant projection on the part of the 
nondefensive group may have been 
due to their different method of meas- 
uring defensiveness, as well as to the 
fact that their group consisted mainly 
of women while Murstein’s consisted 
of men. Lastly, the sex of their ex- 
aminer was not indicated, and may 
have been a woman, while in Mur- 
stein’s study the examiner was a 
man. It seems quite plausible to be- 
lieve that the sex of the subject, as 
well as that of the examiner, influ- 
enced the kind of response elicited. 

It is difficult to draw conclusions 
from the aforementioned “classical” 
projection studies due to the varying 
methods of measuring the concept. 
Murstein’s work pointed out the in- 
fluence of background factors for the 
manifestation of projection, as well 
as the occurrence of projection with 
both “friendly” and “hostile” per- 
sons. It would appear, therefore, that 
the mechanism of projection is best 
understood as a means of attaining 
self-consistency rather than solely as 
a defense mechanism. Within this 
framework, the denial and subse- 
quent projection of favorable traits 
may be readily understood. The 
“tough guy” may vigorously repress 
any inclination of “humanity” to- 
wards others since such behavior 


would be inconsistent with his self- 
perceived role. 


Altributive Projection 


By far the greatest amount of re- 
Search has been undertaken within 
the confines of this concept. Sears 
(1937) found a correlation of .64 be- 
tween an inventory measuring self- 
criticism and one tapping ideas of ref- 
erence. Apparently, self-critical per- 
sons perceived others as also being 
critical of them. Thomsen (1941) 
found that in the 1940 presidential 
election, the Majority of persons 
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sampled were of the opinion that the 
candidate whom they favored would 
a the election. Wallen (1941) 
a 85% (N = 237) of the stu- 
Sila a small residential college 
eae : them to estimate the percent- 
ae students in the college who 
A Spee opinions on each of three 
sce issues (war entry, draft, St. 
to a seaway), and in addition 
e their own views. A signifi- 
dire proportion overestimated in the 

le of their own opinions. 
Pace (1954) had 20 Harvard 
AS AEN rate pictures of college 
Tess n ee a 10-point scale of happi- 
T then rate themselves ona 6- 
found li Some relationship was 
Aen etween „the self-rating and 
a a of the pictures. A tendency 
We ound, however, for individuals 
Aoi ayia extreme happiness, to 
This ps less happiness to others. 
ane effect may have been a function 
bil; egression to the mean, low relia- 
pility, and the “ceiling effect” of 

happy persons.” It will be readily 


apparent that no “extremely happy” ` 


Pe (rated 10 on a 10-point scale) 
oud perceive anyone as “happier” 
an himself; only “equally happy” 
or “less happy™ persons might be 
perceived, making for a negative Cor- 
relation between self and others for 

such persons. 
stints (950 found the children 
P o had just seen the film “Peter 
an” tended to perceive Peter Pan’s 
age as close to their own. The cor- 
relation for boys was 34, and for 
girls .68, both significant. 
Halpern (1955) had 38 student 
eee take the GAMIN personality 
ee and had them predict the 
eyes of other nurses in their im- 
a late subgroup. Predictions for 
cia te having self scores similar to 
Shi, own were significantly more âc- 
sities ot were predictions for dis- 
vs. aroha The r for similarity 
ictive accuracy was .84. Sim- 
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ilar findings are reported by Suchman 
(1956), and Wittich (1956). Alfert 
(1958) found that when item re- 
sponses were socially desirable, “at- 
tributive” projection (Alfert used the 
term “assumed similarity”) increased 
for traits congruent with the perceiv- 
er’s ideal more than items involving 
ideal-discrepant traits. Though the 
persons whose responses were pre- 
dicted were comparative strangers to 
the predictor, they were probably of 
similar education and socioeconomic 
status. Thus, Alfert’s findings that 
more items involving traits congru- 
ent with the S’s ideal self were at- 
tributed to these “strangers” than 
were said to be possessed by the S 
himself seems understandable. One 
may wonder whether such favorable 
perceptions of others would have oc- 
curred among persons of another so- 
cial or economic group. 
Fiedler (1951, 1952, 1953) found 
that persons liked were assumed to be 
more similar to the self than persons 
disliked. Good therapists, for , ex- 
ample, showed a greater tendency to 
assume similarity unjustifiably be- 
tween patients and themselves than 
was the case for poor therapists. 
These studies seem to indicate the in- 
dependence of the concept of “at- 
tributive”’ projection from defensive 
behavior. 
The above findings clearly dem- 
onstrate that people often predict 
that other persons, usually of a sm- 
ilar occupational or social group, hol 
many views simi 
Moreover, these predictio 
better than chance. aN 
however, may be subsidiary tO the 
fact that many ° 


volved situations e 
In addition, most of 


little ego-threat. 
‘es involved 
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relations with their own self Te- 
sponses, thus automatically making 
them “projectors.” 

Lundy (1956) has shown that when 
the center of attention is on the self, 
projection is more likely to occur 
than when it is on the other person. 
When, however, an ego-involving 
situation is at stake, it might be ex- 
pected that individuals would be 
more careful in making distinctions 
between themselves and others. 
Thus, Rokeach (1945), studying the 
ego-involving attribute of “beauty” 
among a female population at Brook- 
lyn College, found a nonsignificant 
correlation of —.08 between self- 
ratings and ratings ascribed to other 
females. 

The role of self-involvement as dis- 
tinct from insight is sharply illus- 
trated in a study by Weingarten 

(1949). She had a group of 74 college 

students write autobiographies of 

themselves. Two judges used the 
autobiographies to rate the subjects 
on “tension” and “insight” with re- 
gard to self, family, and the non- 
familial social environment. The Ss 
also were given a series of 75 state- 
ments describing behavioral incidents 
and asked to interpret the psycho- 
logical importof theseincidents. Pro- 
jection was measured by the correla- 
tion between the judges’ ratings of 
the subject’s tension as expressed in 
his autobiography, and his inter- 
pretation of the behavioral incident 
inventory. The results appear in 
Table 2. From these Correlations it is 
apparent that “attributive” projec- 
tion is not necessarily related to in- 
sight. Where the “self” was involved, 
both high- and low-insightful groups 
tended to project. Again, it appears 
helpful to view this result as signify- 
ing a “self’’-enhancing tendency for 
most persons regardless of their in- 
sight into their own personalities. 
By perceiving behavior in the be- 
havioral incident inventory as sim- 


TABLE 2 


“PROJECTION” CORRELATIONS FOR HIGH- AND 
Low-Instcut GROUPS FOR SELF, FAM- 
ILY, AND SOCIAL ENVIRONMENT 
(WEINGARTEN, 1949) 


High- mee 
: Insi 
Variable Insight ee 5 
(N=24) | (W=33) 
r r 
Self 41 -38 
Family .00 -40 
Social Environment 15 val 


ilar to their own, as described in their 
autobiographies, the Ss tended to 
justify their own behavior as similar 
to the norm. 

A welcome relief from the omni- 
Present college population is afforded 
in a study by Friedman (1955). He 
used three groups, 16 normals, 16 
Psychoneurotics, and 16 paranoid 
schizophrenics, all of whom Q-sorted 
items pertaining to their “phenom- 
enological” self and their “ideal” 
self. In addition, they were presented 
with five TAT cards and their result- 
ing themes were rated as to the pres- 
ence of “projected” self, 

The findings revealed significant 
relationships between “phenomeno- 
logical” self sorts, and “projected” 
self on the TAT, for the normal and 
the neurotic groups, but not for the 
psychotic one, Considering the 
“ideal” self sort vs, “projected” self 
on the TAT, only the correlation of 
the normal group proved to be 
significant. The correlations revealed 
no significant diffetences between 
normal and neurotic population be- 
havior concerning the “‘phenomeno- 
logical” self and TAT protocols. 
Both groups showed a tendency to 
reveal something about themselves 
on the TAT, Probably similar to that 
given on the Q sort. While the nor- 
mal group projected their “ideals” 


: 


bo g 
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on the TAT cards, however, the 
neurotic group did not. Once again, 
the manifestation of projection 
seemed to be a function of self con- 
cept. The more anxious neurotic 
group was less consistent in its mani- 
festation of ‘“‘self.” 

Some writers have attempted to 
improve the measurement of projec- 
tion by more quantitative methodol- 
ogies than the earlier writers used. 
Rokeach (1945), for example, in his 
study of beauty, designated as in- 
sightful those subjects who in the 
beauty group slightly overestimated 
themselves, and in the homely group 
slightly underestimated themselves. 
His reasoning was that regression to 
the mean made the group judgments 
a slight underestimation of the true 
score of the individual. But, regres- 
sion to the mean is a function of re- 
liability; the greater the reliability, 
the less the regression toward the 
mean, Since no reliability coefficients 
were obtained with regard to the 
group judgments, Rokeach’s method 
seems questionable. 

Bender and Hastorf, in a series of 
experiments, sought an exact quan- 
titative approach to projection (1950, 
1952, 1953). A typical experiment 
(1952) required 50 students to fill out 
an adjustment scale for themselves 
and for the way they thought their 
friends would rate themselves. The 
variables were defined as follows: 

Projection: The total item-by-item 
deviation of the forecaster’s own re- 
sponses from his predictions for an 
associate. Empathy: The deviation 
between the forecaster’s prediction 
for his associate and his associate’s 
self-rating. Refined Empathy: Em- 
pathy minus Projection. Similarity: 
The deviation of a forecaster's self- 
rating from his associate's self-rating. 
The correlations obtained, are shown 
in Table 3. 

Similarly, Cowden (1955) in a 
study of married couples, also found 


TABLE 3 
CORRELATION OF “PROJECTION” WITH SIM- 
ILARITY,” “EMPATHY,” AND “REFINED 
EMPATHY” (BENDER AND HASTORF, 1952) 


Variables r 
Projection vs. Similarity .32 
Projection vs. Empathy 337. 


Projection vs. Refined Empathy |—.58 


a substantial correlation between 
Projection and Empathy. 

The results of both studies seem to 
imply that the more similar the per- 
sonalities of two individuals, the 
more likely is projection apt to occur 
when one individual attempts to 
evaluate the personality of the other. 
The ability to see an individual as he 
sees himself (empathy) is related to 
the tendency toward projection. If, 
however, the projection component 
is removed from the score of Em- 
pathy, then a negatiye relationship 
exists between these two variables. 

The conclusions, however, are an 
artifact of the statistical procedure. 
A contributing factor to the spurious 
relationship between projection and 
empathy as defined above is the fact 
that each of these scores had an 
identical component (prediction for 
an associate). The other scores illus- 
trate similar common components 
when they are correlated with each 
other. An objection might also be 
made on purely logical grounds to the 
assumption that projection has 0c- 
curred if a person attributes a trait to 
another which he possesses himself 
(Murstein, 1957b). 

In a more recent article, Hastorf, 
Bender, and Weintraub (1955) 
brought out another difficulty with 
their earlier work. People who havea 
tendency to give end scale responses 
(almost always, almost never), and 
who also accurately predict midscale 
personalities, assure themselves of 


high Refined Empathy scores. The 
reason for this is that Refined Em- 
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pathy is equal to Raw Empathy 
minus Projection. Hence, the more 
dissimilar the personalities, the small- 
er the projection component sub- 
tracted, and the larger the Refined 
Empathy score. i 

By way of confirmation, the au- 
thors reported a rho coefficient of .47 
between the tendency to give end 
scale scores and the Refined Em- 
pathy score. Conversely, a man who 
used the midpoint of the scale most 
frequently in his predictions for 
others, and also used midpoint pre- 
diction for himself, automatically 
became a projector. In short, projec- 
tion under these circumstances was a 
function of response habit rather 


than any aspect of personality or 
cognition. 


“complex” 
judges were more 


and 
Nevertheless, both g; 


equal in predictiye 
study by Crow (195 
dents subjected to training in inter- 
personal relationships manifested 
greater complexity in their Predic- 
tions of patients’ MMP] Profiles (i.e., 
the variance of their Predictions in- 
creased as compared to their pretrain- 
ing predictive variance), than they 
had prior to this training. Neverthe- 
less, their accuracy dropped as-a re- 
sult of this training. Apparently, by 
overstressing the “individual,” the 
basic similarity of persons or at least 
of their answering habits may be un- 
derestimated. 
In a study involving a plethora of 


persons. 
roups were about 
accuracy. Ina 
7b), medical stu- 
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discrepancy scores, Fabian (1954) de- 
fined projection as the rating of 
“others” by a subject, minus the 
mean rating of the subject by the 
group. Insight was measured by the 
self-rating minus the mean rating of 
S by the group. Again, because of 
the component common to both 
measures (mean rating of S by the 
group), a (spurious) positive correla- 
tion was reported for projection and 
insight, 

Bieri and his co-workers (1953, 
1955a, 1955b) have been interested in 
the relationship of the interaction of 
Persons and changes in perception. 
In an early study, Bieri (1953) found 
that after interaction for some 20 
minutes, college students’ “assimila- 
tive” projection scores (predicting 
the same responses for another as one 
fills out for Oneself) tended to increase 
significantly, The test used to meas- 
ure these Predictions was the Rosen- 
Not surprising was 
the fact that, although a chance score 
8, the mean score 
was approximately 13.5 even before 
In other words, the 


matter will be 
detail later. 


and the Operational constructs of 
Predictive accuracy and “assimilative”’ 
Projection, Predictive accuracy was 
composed of “accurate” projection 
wherein “A” correctly predicts a re- 


projection was Composed of “accu- 


and “inaccurate” 


The results in Table 4 seem to 
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TABLE 4 


CORRELATIONS BETWEEN COGNITIVE COM- 
PLEXITY AND VARIOUS PERCEPTUAL 
Scores (BIERI, 1955) 


Predictive Behavior nes 
r 
Predictive accuracy +29 
“Assimilative”’ projection — .32* 
“Accurate” projection 02 
Accurate perceived differences 39% 
“Inaccurate” projection — .40* 
Actual similarity -20 


* Significant at .05 level. 


show that lack of cognitive com- 
plexity in the perception of others 
leads to the perception of others as 
similar in response habit to the fore- 
caster, often with a resulting loss in 
accuracy of perception. The study is 
somewhat weakened by the fact that 
several of the correlations are pre- 
determined, and nonindependent 
from the other correlations. Thus, 
given that the correlation involving 
accurate perceived differences is posi- 
tive, it is most probable that the cor- 
relation containing “inaccurate” pro- 
jection is negative. The reason for 
this may be readily seen. Consider- 
ing all of the responses where the 
forecaster and forecastee differ in 
their self-responses, the following 
truism emerges: if the forecaster ac- 
curately perceives these differences, 
then he does not inaccurately per- 
ceive the forecaster as similar to him- 
self. In other words, he does not per- 
ceive inaccurately. Hence, anyone 
with a high accurate perceived dif- 
ferences score must consequently 
have a low “inaccurate” projection 
score. If both variables are them- 
selves correlated with the same vari- 
able (cognitive complexity of per- 
ception), then a positive correlation 
for accurate perceived differences (.35) 
makes it probable that there will bea 
negative correlation for “inaccurate” 


projection (—.40). In the foregoing 
table, therefore, few correlations may 
be selected without determining the 
others to some degree. 

In another study, Bieri, Blachar- 
sky, and Reid (1955) used the Incom- 
plete Sentences Blank (ISB), and the 
Manifest Anxiety Scale (MAS), as 
indicators of adjustment. They pre- 
dicted a negative relationship be- 
tween degree of maladjustment and 
accuracy of predicted behavior, and 
a positive relationship between the 
degree of maladjustment and the 
tendency towards ‘“‘assimilative’’ pro- 
jection. The Ss were 33 college men 
and 7 college women. The results 
were opposite to those predicted. 
The better adjusted individuals used 
“assimilative’ projection more than 
the more poorly adjusted, while the 
poorly adjusted were significantly 
more accurate in perceiving differ- 
ences between themselves and others. 
The better adjusted tended to per- 
ceive others like themselves (r=-71) 
and they were largely accurate in this 
perception (r =.64). The more mal- 
adjusted persons tended to perceive 
others as different from themselves 
with high accuracy. Thus, one is 
faced with two different subgroups hav- 
ing good predictive ability with regard 
to other persons, one group being called 
“projectors,” and the other not. What 
more concise proof of the inadequacy 
of the definition of projection? The 
measure of projection is again a func- 
tion of the homogeneity of the group. 
In a fairly homogeneous college 
group there is a high degree of simi- 
larity between responses to various 
questionnaires, and accordingly a 
high degree of “projection” and 
“accuracy” for those able to discern 
group homogeneity.  Deviates in 
terms of maladjustment are readily 
aware of the differences between 
themselves and the group, and can 
accordingly predict others’ responses 
accurately. Such persons, however, 
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rarely ‘“‘project,’’ because they are 
quite different from the majority of 
the group. ‘Projection,’ in this 
sense, is an artifact, since for both ad- 
justed and maladjusted persons the 
same degree of predictive efficiency 
receives different labels with different 


connotations. y 
The evidence from experiments 
utilizing “attributive” projection, 


therefore, would seem to support the 
following conclusions: 

(a) The use of discrepancy scores 
has resulted in many psychological 
findings which are statistical arti- 
facts. 

(b) “Attributive” projection may 
be related to ego-involvement. 
Whether projection occurs or not 
would seem to be dependent upon the 
relation of such behavior to the self 
concept. 

(c) “Attributive” projection may 
result from such diverse phenomena 
as correctly perceiving similarity be- 
tween another and oneself, or incor- 
rectly perceiving such similarity. It 
may stem either from a lack of in- 
formation, often referred to as 
“naive” projection, or from an ade- 
quate supply of information (ac- 
curately perceived similarity). In 
short, the term connotes such a 
varied number of meanings that it 
possesses little explanatory signif- 
icance without reference to the com- 
position of the group of perceivers, 
their homogeneity, and ego-involve- 
ment in the task. Also to be con- 
sidered is the variability of the “í per- 
sonalities” to be judged as well as the 
nature of the information available 
about them. The measurement of 
“attributive” projection is more 

likely, therefore, to reveal the cogni- 
tive-response habits of a judge than 
information about the dynamics of 
his personality. 


“Autistic” Projection 


One of the earliest recognized dis- 
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tortions in perception has been that 
which stems from the manifest needs 
of the subject. In 1904, Külpe found 
that “actual sensory qualities of the 
stimulus which were not relevant to 
the task-set were to all intents and 
purposes not seen” (Helson, 1953, p. 
22). Helson also gives the example of 
the classical “complication” experi- 
ment where the S watches a moving 
pointer and reports its location at the 
moment a bell is sounded. The 
stimulus for which the S is “set” is 
perceived prior to the incidental one. 
“Thus if the bell sounds at 20 objec- 
tively, it is seen at scale division 10 
when the pointer is attended to; if the 
sound is attended to, the pointer is 
seen at 30. This phenomenon is per- 
ceptual and is not a matter of judg- 
ment” (Helson, 1953, p: 22), * 

Murray (1933) in some early work 
preceding the development of the 

AT, gave some photographs to a 
group of eleven-year-old girls at a 
house party which his daughter gave. 
The girls rated the pictures once after 
a normal pleasurable experience, and 
once after a game of “murder.” 
There was a considerable increase in 
the degree of maliciousness attributed 
to the pictures after the second game. 
That even nonambiguous tasks may 
elicit projection of emotional states 
has been reported Johnson (1937a, 
1937b) who found’ that normal but 
euphoric Persons tended to overes- 
timate distances between two points, 
while normal but depressed subjects 
tended to underestimate these same 
distances, 

Sanford's two studies (1936, 1937) 
on the effects of abstinence from food 
on imaginal processes were the pre- 
cursors to the “New Look” in per- 
ception. He gave 10 children words 
to associate and ambiguous pictures 
to interpret, both before and after 
meals. The subjects gave signif- 
icantly more food responses before a 
a meal than after one. McClelland 
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and Atkinson (1948) have also found 
increases in food responses as a func- 
tion of periods of deprivation not 
exceeding 16 hours. 

Levine, Chein, and Murphy (1942), 
using food deprivation periods of 3, 
6, and 9 hours, found that a deprived 
group gave more food responses after 
3 and 6 hours than at the start in re- 
sponse to chromatic ambiguous fig- 
ures projected on a screen. After the 
9th hour, however, they manifested a 
decrease. Similarly Brozek, Guetz- 
kow, and Baldwin (1951) working 
with a group of men subjected to 
semistarvation for 24 weeks meas- 
ured the perception of food responses 
through the use of direct questioning, 
by the Rosenzweig Picture-Frustra- 


„tion Test, the Kent-Rosanoff Free- 


Word Association Test, and the 
Rorschach. The number of food re- 
sponses showed no significant change 
asa function of'time. Apparently the 
projection of deprived need is not a 
simple phenomenon. Projection oc- 
curs only when the S believes there is 
some chance of immediate gratifica- 
tion. When it becomes apparent 
that gratification will be delayed 
there is no projection. Projection is 
once again seen to be subsidiary to 
the need-gratification expectation of 
the perceiver. 


“Rationalized”’ Projection 


The data with regard to ‘‘rational- 
ized” projection have supported this 


‘meaning of the concept in the few 


experiments reported. Ina study at 
the University of Vienna, Frenkel- 
Brunswik (1939) had four judges 
rate the conduct of some 40 students 
as well as write a description of their 
personalities. The students wrote an 
autobiography concerning their con- 
duct at the University, the principles 
guiding their conduct, and the 
changes they felt should be made. 
She found that many students felt 
that the environment should com- 
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pensate for their personal shortcom- 
ings. "In this instance the subject 
seems not to realize his own defect, 
but rather to project it on the en- 
vironment” (1939, p. 418). For ex- 
ample, the rho correlation between 
the rating of lack of scientific ability 
by the judges and comments by the 
students of a need for pedagogical 
changes was .60. A lack of discipline 
on the part of the students was ra- 
tionalized by their demand for more 
regimentation (rho =.62). Moreover, 
the overambitious persons often 
checked “I always do what I am 
ordered to do,” and the aggressive 
ones asserted “I do not let myself be 
intimidated.” 

In a unique and imaginative ex- 
periment, Posner (1940) gave a 
group of eight-year-old children two 
toys to play with, one preferred, and 
the other nonpreferred. They were 
then asked to give one to a friend to 
play with, after which each child was 
asked which toy he thought the 
friend would have given away. 
selfish judgment (friend also would 
have given away the nonpreferred 
toy) was considered to indicate 
projection. The control situation, 
created to avoid guilt feelings, and in- 
volving another matched group, did 
not require the child to give away 
one of the toys. Under these circum- 
stances there was a much smaller de- 
gree of projection of selfishness. 

Bellak (1944) found that the ag- 
gressive word content increased in 
TAT stories in the last five cards, 
when subjects were severely criti- 
cized, after receiving the first five 
cards without comment. Although 
Bellak interpreted this as “true 
(Freudian) projection, he reported 
that when criticized for the poor 
quality of their stories, the sahig 
admitted their inability but offere 
excuses. They blamed the “ambigu- 
ity” of the pictures and the inpak 
quacy” of the instructions for their 
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poor performance. Apparently, the 
subjects were producing rationa- 
lized” projection rather than clas- 
sical” projection. Their stories mani- 
fested considerable hostility of which 
they were aware, but which they 
could not express because of the 
authoritiative position of the ex- 
aminer (Murstein, 1957b). 


Miscellaneous Studies of Projection 


Several experiments could not be 
categorized either because of a lack 
of information concerning the con- 
cept of projection or because the aim 
of the experiment was to arrive at 
some refinement of categories (Cat- 
tell & Wenig, 1952). Holt (1951) 
asked 10 judges to rate each of 10 
college students on 36 personality 
variables, including “projection”? on 
a 6-point scale. ‘‘Self-insight’’ was 
the only concept not rated as such, 
but was obtained as the sum of the 
squared differences between the rat- 
ings of the judges and the subjects. 
The correlation between “Drojec- 
tion” and “‘self-insight’’ was .50, 
which, because of the paucity of 
subjects, did not quite reach signif- 
icance at the .05 level. The correla- 
tion of “projection” with the six most 
attractive traits was .54; with the 
six most unfavorable traits .04. “In- 
telligence” was found to correlate .48 
with “projection.” These rather un- 
usual findings are difficult to inter- 
pret because one does not know what 
conception of projection each judge 
held. Another difficulty is the lack of 
information as to the relative number 
of favorable traits compared to un- 
favorable ones. If one hypothesizes 
that the judges were using an attrib- 
utive concept of projection without 
reference to the self concept of the S, 

then the results are somewhat more 
understandable. In a fairly homo- 
geneous select group (Harvard col- 
legians), the smarter, perhaps better 
adjusted students, would tend to re- 
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fer to others as similar to themselves 
(probably quite correctly), and to be 
perceived by the judges as “‘projec- 
tors,” since they would have attrib- 
uted characteristics to others that 
they themselves possessed. Once 
again, in this eventuality, one would 
deal with persons classified as “‘pro- 
jectors’”’ solely through the small 
amount of group variance resulting 
from group homogeneity. 

The Blacky Test was used by 
Cohen (1956) on a college group in 
which Ss were rated on the dimen- 
tions of projection, regression, reac- 
tion-formation, and avoidance. The 
Ss worked together on a task and 
evaluated each other on interpersonal 
ability. Cohen found that “pro- 
jectors’ in a group showed more 
negativeness and hostility in inter- 
personal relationships towards each 
other than when in other groups. 
When paired with “nonprojectors,” 
however, “projectors” were not more 
threat-oriented than other dissimilar 
pairs. The hierarchy of defense re- 
sponses with regard to perceived 
negative interaction was (a) projec- 
tion, (b) regression, (c) reaction for- 
mation, and (d) avoidance. 

These interesting results indicate 
that projection, as measured by the 
Blacky Test, seemed to give an in- 
dication of noncommunication re- 
sulting from the perception of ex- 
treme threat to the self. It would 
have been helpful if the basis for de- 
ciding who was, and who was not, a 
projector had been discussed in 
greater detail. 

Cattell has concerned himself with 
the various concepts of projection 
(1944, 1951). An ambitious attempt 
to factor-analyze some of the different 
conceptions of projection was made 
by Cattell and Wenig (1952). The 
authors used the term ‘“mispercep- 
tion” because they felt that they were 
dealing with measurement to a dis- 
crepancy between individual reac- 
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tions to the TAT-like pictures, which 
they used, and some superindividual 
standard of reaction. They believed 
that “misperception” is affected by 
three considerations: (a) abilities, 
(b) experiences (information, skills), 
and (c) the dynamic needs of the sub- 
ject. Their hypothesis was that the 
nature and magnitude of mispercep- 
tion effects generally are to be ac- 
counted for by eight misperception 
factors within the Ss. These were: 
(a) cognitive (intelligence), (b) cogni- 
tive (information concerning the 
field), (c) cognitive (information con- 
cerning the principal person), (d) 
consciously accessible dynamic traits 
(autism), (e) press-compatibility, and 


the defense mechanisms: (f) projec- 
tion, (g) rationalization, and (h) 
phantasy. 


A factor analysis of the stories se- 
lected (8 choices were possible, one 
for each hypothesized factor), and 
other “marker” variables did not 
seem to bear out the hypothesis. 
Various factors were teased out of 
the data and given labels of “phan- 
tasy,” “naïveté,” “autism,” ‘‘ra- 
tionalization,” and ‘‘true projec- 
tion,” Still, it is evident from the 
examination of loadings on each fac- 
tor that several kinds of ‘‘mispercep- 
tion” are represented in each factor. 
On the “naïveté” factor, the “naïveté” 
measures range from .77 to 48. 
Nevertheless, a “press” loading of 
approximately 40, as well as an 
“autism” loading of equal magnitude, 
was present. The “autism” factor 
contained substantial loadings for 
‘atonalization, as welll sas for 
“autism.” The. “true” projection 
factor was loaded with ‘press’ pro- 
jection, “phantasy,” and “naïveté.” 

Cattell and Wenig seemed mostly 
concerned with the high “press” 
loading on the “true” projection 
factor. They stated that a single 
underlying mechanism might be in- 
volved both in projecting an un- 


conscious drive and in reconciling ex- 
ternal facts with conscious internal 
moods, namely, a need for self con- 
sistency. “If so, the underlying pro- 
cess would be better labeled as ‘self- 
saving’. . . - Secondly, and far more 
important, the misperception is not 
due to a single process, projection, 
but to several dynamic mispercep- 
tion processes, which superimposed, 
frequently act in different directions” 
(1952, p. 807). 

While Cattell and Wenig’s con- 
clusions were based on an extensive 
analysis of the data, there appears to 
be a serious difficulty with regard to 
their operational conception of what 
constituted the various kinds of'‘mis- 
perception” or “projection” as shown 
by these examples of what they re- 
garded as different kinds of projec- 
tion: the woman in the picture ts 
dominating the man because he wants 
her to. . . he enjoys being dominated 
(autistic misperception) ; the man 
kneeling at the boy’s side is dominat- 
ing the boy because the boy is a very 
submissive person which necessitates 
that he be led (press-compatibility 
misperception) ; the older man is the 
boy’s father who dominates the boy for 
his own good (rationalization). 

One wonders whether any basic 
differences exist between these three 
themes. All of them seem to embody 
a “rationalized” type of projection. 
The difficulty in detecting clearly in- 
dependent kinds of projection may 
very well be traced to the noninde- 
pendence of the choices presented 
to the Ss. It would be of interest to 
repeat this study using valid opera- 


tions for distinguishing the different 


types of projection. 

Lastly, Jenkin (1956) had Ss look 
at pictures of varying ambiguity pro- 
jected ona screen. “Projection” was 
said to occur, when the subject 
seemed sure of the objective reality 
of his interpretation. “Rationaliza- 
tion” occurred when the subject made 
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his interpretation in a tentative 
manner. Correlations of this version 


of projection with various Rorschach 
and Rosenzweig P-F determinants 
were reported. Projection was sig- 
Laas correlated with the Ror- 
schach’s M% (.38), Rosenzweig’s 
E% (.36), and E-D% (.69). Nega- 
tive correlations were found for 
Rosenzweig’s M% _ (—.35), and 
N-P% (—.54). “Rationalization” 
was positively correlated with Rosen- 
zweig’s M% (.42), and negatively 
correlated with his E% (—.38) and 
the Rorschach’s W% (—.49), and 
M% (—.43). Obviously, “projection” 
here meant a confident assertive 
method of reporting perceptions while 
“rationalization” signified a Casper 
Milquetoast-like approach. 

The correlation between projec- 
tion and projection/rationalization 
was reported as .94, while that of 
rationalization with Projection/ra- 
tionalization was found to be —.40, 
Both are artifacts containing spurious 
common elements. In the first case, 
the spurious element in the numera- 
tor (projection) assured a positive 
correlation, while in the latter case, 
the common element in the denom- 
inator (rationalization), provided an 
artifactual negative correlation. 


The “Operational” Dilemma 


The adherents of the psychoanalyt- 
ic school of personality measurement 
have been strongly attacked for their 
failure to define operationally such 
concepts as “projection” and “re. 
pression.” These criticisms, justified 
to a large extent, have led the more 
recent psychodynamically-oriented 
researchers to embrace more opera- 
tional definitions of such variables as 
“projection,” “empathy,” “reality, 

and “insight.” Unfortunately, the re- 
sult of zealous operationism has been 
a neglect of the original psychological 
meaning of these variables. Instead, 
we are faced with a string of opera- 
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tional phrases of the “A” rates him- 
self (“A”) as he thinks “B” would 
rate him—variety. When one has 
plodded through the adding and sub- 
tracting of the 8 operations necessary 
to arrive at a score of “Refined Em- 
pathy” or the 12 operations used in 
correlating “Refined Empathy” with 
“Projection” one may be sorely 
tempted to return to the “good old 
days” of literary definition. What is 
needed is an operational definition 
which does not depart from the ac- 
cepted psychological meaning of a con- 
cept. It is clear that much of the re- 
search reviewed in this paper has 
failed to fufill this need. Instead, 
judges have been asked to predict, 
according to Gage and Cronbach 
(1955), such diverse Operations as: 


(a) how persons in general will behave; 
(6) how a particular category of persons de- 


viates from the behavior of persons in 
general; 


(c) how a particular group deviates from 
the typical behavior of the particular 
category it belongs to; 

(d) how an individual deviates from the 
typical behavior of the particular group 
he belongs to; 


(e) how an individual on a particular occa- 
ston will deviate from his typical be- 


eel (Gage and Cronbach, 1955, p. 


_ One is faced here with many abili- 
ties whose relationship to personality 
patterns is rather slight. Cronbach 
(1955) mentions several of these abili- 
ties including “differential eleva- 
tion,” measuring the forecaster’s 
ability to judge deviation of the in- 
dividual’s elevation from the average; 
“stereotype accuracy,” measuring 
the judge’s accuracy in predicting the 
“generalized other”; “differential ac- 
curacy,” reflecting the judge’s ability 
to predict differences between sub- 
jects on any item. The shrewd judge 
is often one whose predictive variance 
(sy) does not exceed the self predic- 
tive variance of the person being 


judged (ex). “Accuracy is improved 
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as oy approaches rxyOx- That is to 
say, the variation in predictions 
should never exceed the variation in 
true responses and should ordinarily 
be much smaller” (Cronbach, 1955, 
p. 181). 

Crow (1954) found predictions to 
be accurate for the generalized 
other” person, when the group for 
whom predictions were made was 
(a) homogeneous and (b) judges ex- 
hibited response sets which were 
general over “others.” In another 
study, Crow and Hammond (1957) 
found that the response set of medical 
students (perceptual stereotypy of 
patients’ responses to a personality 
scale) was more consistent over time 
than differential accuracy (the abil- 
ity of S to predict differences be- 
tween other people on any item). 

Eight keys to interpersonal per- 
ception are listed by Gage, Leavitt 
and Stone (1956) as affecting ac- 
curacy. The keys are hypothetical 
protocols derived from various con- 
structs and compared with the fore- 
caster’s prediction for Sand S's self- 
response. The keys found to be im- 


portant in determining responses 
were: 
A Priori Keys: 

the 


1. Acquiescent Tendencies on 
Part of the Judge and S. If both are 
highly acquiescent, the former in pre- 
dicting, and the latter in answering 
personality inventories, accuracy of 
interpersonal preception will be high. 

2. Favorability of the Judges’ Pre- 
dictions and of the S'S Self. If the 
items possess high social desirability 
and if the S wishes to appear ina 
favorable light, there will again be 


high accuracy of perception by the 
judges. 
3. Adjustment Tendencies Derived 


from Adjustment Value of Responses. 
These are closely allied to the “favor- 
ability” tendencies. 
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Keys Obtained by Varying Instruction 
to Judges: 

4. Judge's Self-Description. Judges 
may be accurate because of high sim- 
ilarity to the subject and their as- 
suming similarity in their predic- 
tions, or they may be dissimilar, and 
assume little similarity and still be 
accurate. 

5. Stereotypy. By predicting the 
typical member, a stereotypy key is 
obtained. Accuracy ensues when the 
judge follows his stereotype when the 
subject is closely similar to it. 

6. A manifest stimulus value key 
may be derived from the modal 
judge’s descriptions of the subject. 
This key relates to the impression S 
makes on those judging him. When 
combined with S's self perception, it 
yields an “insight” or “frankness” 


score. 


Keys Based on Central Tendencies of 
Predictions or Self Descriptions 

7. Modal Prediction keys may stem 
from (a) the average prediction of n 
judges for a single subject, and (b) 
the average prediction by a single 
judge for many subjects. The former 
key will result in high accuracy when 
judges make highly typical predic- 
tions for highly predictable others. 
High accuracy also will occur when a 
judge makes an atypical correct pre- 
diction for the way S will answer an 
item, while the majority of judges 
make a stereotyped but incorrect 
choice. The latter key (b) gives a 
measure of a judge’s “implicit stereo- 
type.” 

8. A modal self-description of the 
subjects may be utilized against a 
single subject’s self-responses and a 
single judge’s prediction. Accuracy 
then depends upon the judge’s stereo- 
type of the group as well as the in- 
dividual’s deviation from the stereo- 
type of the group. 

It is apparent t 
“keys” do not re 


hat these various 
flect very much 
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about defense mechanisms. In- 
stead, they seem to consist of cogni- 
tive response habits in which a 
fastidiousness for operationism has 
resulted in a growing aridness of 
conceptual meaning. One may well 
speculate as to the multitude of 
factors which would result from a 
factor analysis of the current opera- 
tional definitions of “projection.” It 
is therefore hardly remarkable that 
projection has been found to corre- 
late both positively and negatively 
with such variables as “‘insight’’ and 
“empathy.” The dilemma can be 
resolved by insisting that operation- 
ism be utilized jointly with clinical 
meaning rather than merely sup- 
planting the latter. 


Discussion 


From birth, man views the world 
imperfectly, using his sensitive but 
far from perfect eye as a camera to 
bring him into communication with 
the outer world. While it is true that 
states correlated with cortical im- 
pulses are projected to the supposed 
location of the stimulation, such a 
concept of projection seems too 
broad to be of use. We propose to 
limit the definition of projection to 
perception or judgments having to 
do with the personality of the organ- 
ism, thereby eschewing any physio- 
logical or cognitive components. A 
definition which is broad enough to 
cover the material reviewed in this 
article is as follows: 

Projection: The manifestation of 
behavior by an individual which in- 
dicates some emotional value or need 
of the individual. 

Such behavior may vary in the degree 
of defensiveness, depending upon the 
situational context and the personal- 
ity of the perceiver. Thus, where 
little or no ego-threat is involved, a 
person may project his emotional 
values through his method of or- 

ganizing and selecting his personal 


milieu and the objects inhabiting it. 
One might term such behavior as 
“life style” projection. The fact that 
these behaviors are public usually in- 
dicates that they involve little threat 
to the “self.” “Autistic” projection 
illustrates the reaction to strong 
needs, but does not necessarily mean 
these needs are defensive ones. For 
one who eagerly awaits a visit from 
Aunt Agatha, who will be wearing a 
green coat, any middle-aged woman 
emerging from the train wearing a 
green coat may be momentarily mis- 
perceived as the expected relative. 
In “rationalized” projection, how- 
ever, we deal with data in which, 
despite the fact that the content is 
readily accessible to consciousness, 
the motivation is distorted so as to de- 
ceive with regard to the real intent 
of the act. The peak of distortion, 
“classical” projection, occurs when 
even the content must be denied be- 
cause of the extent of the threat,and 
the individual sees the unwanted 
behavior as stemming wholly from 
outside sources. 

It will be noted that the concept of 
“attributive” projection has been 
omitted from this discussion. The 
reasons for so doing are that the 
Operation of saying another would do 
as the S would do, need not involve 
any emotional values. It may stem 
instead from (a) a shrewd knowledge 
of the other person’s behavioral ten- 
dencies (perceiving correctly actual 
similarities between the “other” and 
Oneself), (b) dull intellect (perceiving 
another as similar when he is not), 
(c) lack of information (‘‘naive’’ pro- 
jection), or (d) the differing needs of 
the person, which are either rela- 
tively nondefensive (“autistic” pro- 
jection), or strongly defensive (“ra- 
tionalized”’ or ‘‘classical”’ projection). 

In general, we have been critical of 
the “operational” school of re- 
searchers who have attempted to 
measure projection on the basis of a 
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enh response regarding him- 
elf and others through the use of 
Shek inventories and ques- 
Po Ge The correlations accruing 
2 these studies have usually been 
“purius in containing identical oper- 
foes m the variables correlated. 
ba ues the number of correlations 
ave often exceeded the available de- 
grees of freedom. 
oe to being parsimonious, 
aut pak measuring projection 
coke e closely allied to the per- 
a A construct being investigate 
a one is not confronted with the 
Cele of coldly objective, un- 
ite ed persons mysteriously emers- 
RA, projectors.” Projection 
S not be a function of cognition, 
i emotional involvement. 
ike Ps the goals set forth, 
hy neasuring instruments shoul 
nbody the following considerations: 
tea „The instrument should have a 
eee oe consistency for any 
e e intermediary keys not re- 
ated to the investigation. 
A If the composition of the ex- 
we ental group is in the hands of 
Can A the group should be 
ive eas heterogeneous or representa- 
ae at possible (Crow, 1957a). This 
3 also check against 4 lucky 
paes in which one person was gen- 
ralized for a similar population, thus 
E a spuriously high predictive 
i ciency score. The representative- 
ess of a group might also prevent 
spuriously poor predictions when 
several individuals judge only one 
person who well may be an atypica 
member of his group- 
(c) It is extremely unlikely that @ 
standard stimulus (ambiguous pic- 
ture) will elicit the same personality- 
meaningful reaction in every person. 
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An astonishing number of advances 
have been made recently by pharma- 
cologists, biochemists, and neuro- 
physiologists into possible causative 
chemical factors of schizophrenia and 
other psychoses. As a result of the 
reported efficacy of the “tranquil- 
izers” and “psychic energizers” in the 
management of some psychotic pa- 
tients, many plausible fruitfulhypoth- 
eses have been generated and very 
often verified. As these advances 
continue to be made, the conviction 
grows among workers in the area that 
the organic (chemical) changes Te- 
cently discovered in schizophrenia 
are relevant to its etiology- 

The purpose of this review is to 
make available an introductory Over- 
view of several of the outstanding 
theoretical formulations of the bio- 
chemistry of psychotic behavior for 
the experimental as well as the clini- 
cal psychologist. These theories 
should also prove useful in providing 
some rational basis for several chemo- 
therapies currently employed with 
psychiatric patients. However, prior 
to any such consideration it is of in- 
terest to review the results of some 
broad biochemical investigations 
which in the absence of specific hy- 
potheses nevertheless uncovered in- 
teresting correlates which await in- 
tegration into an as yet undefined 
unified biochemical theory of schizo- 
phrenia. 


Some Physiologic and Biochemical 
Correlates of Psychotic Behavior 
Any attempt to but briefly de- 


scribe all or even a significant num- 
ber of the studies that have shown 


correlation between physiological or 
biochemical factors and psychotic be- 
havior would necessitate a separate 
review or even a volume. Instead, a 
limited number of studies will be cited 
that are representative of the broad 
spectrum of research areas that have 
been pursued intensively. Some 
workers have focused their attention 
on the possibility that schizophrenia 
may be due toa cerebral toxin which 
is the by-product of a metabolic def- 
icit. Thus, it has been reported 
(Fischer, 1953) that the blood of 
schizophrenics was toxic to tadpoles 
while Federhoff and Hoffer (1956) re- 
ported that schizophrenic blood was 
toxic to fibroblasts nurtured in a tis- 
sue culture. However, this toxicity 
does not appear to be due to a toxic 
chemical specifically found in schizo- 
phrenic blood because the latter 
workers also found that the blood of 
patients undergoing surgery Was 


gest that some © 
tadpoles and fibroblasts, is found in 
the bloodstream of people exposed to 
severe stress and is also found in the 
blood of schizophrenics. Reider 
(1957) employed Witt’s spider-web 
technique to determine the effect of 
schizophrenic urine on the comp ex 
perceptuomotor 
quired in web building and foun 
significant abnormalities 1! 

pattern. Örström 
his attention On phosp 
olism and has repo’ 
turnover of adenosine triphosphate 
and a higher content of 
colic acid in the erythrocytes o 
schizophrenics: Still another after- 
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math of metabolic impairment may 
be reflected in lowered concentra- 
tions of carbonic anydrase in the 
occipital cortex of schizophrenics and 
in early cases in the frontal cortex as 
reported by Ashby (1950). Richter 
(1957) has stated that the different 
enzyme systems of the liver seem to 
be impaired in schizophrenics and 
that different enzyme systems may 
be affected in varying degrees. How- 
ever, he emphasizes that the enzy- 
matic malfunction need not be the 
invariable concomitant of schizo- 
phrenia. 

In addition to these findings of dis- 
turbed enzymatic reactivity or meta- 
bolic defect, there are physiological 
studies that have employed measure- 
ments of body temperature, heart 
rate, basal metabolic rate, and thy- 
roid activity in normals and schizo- 
phrenics which demonstrate a greater 
variance of the measurements in 
schizophrenics than in healthy peo- 
ple. These results suggest that the 
schizophrenias represent a genera- 
lized disorder of Steady-states. In 
summary, the few empirical findings 
reviewed up to this point demon- 
strate that schizophrenia may be 


characterized by some disordered 


activity of 
i analy, a disordered 
homeostasis which is reflected in the 


measurement of various steady-states, 
“M-SuBSTANcE” 


An important biochemical theory 
of schizophrenia was formulated 
within this decade by Osmond and 
Smythies (1952), and elaborated by 
Hoffer, Osmond, and Smythies, 
(1954). The reader may be interested 
in the logical development of this 


theory. It was Hoffer (1957) who 
claimed that schizophrenia is a dis- 
ease of the autonomic nervous sys- 
tem. Although vague as to the 
nature of the primary defect, whether 
it is a constitutional biochemical fac- 
tor or psychogenic, the consequent 
events are each stated with war- 
rantable assertability. A dominant 
feature of the autonomic disturbance 
is an increased parasympathetic ac- 
tivity or pronounced central choliner- 
gic activity. This increased produc- 
tion of acetylcholine, by stimulating 
sympathetic ganglia, in turn pro- 
duces an increase in the secretion of 
norepinephrine and epinephrine. 
Whereas in normal individuals epin- 
ephrine is metabolized by amine 
oxidase or sulfoesterase, in schizo- 
phrenia defective metabolism of 
epinephrine, by a phenolase rather 
than the amine oxidase or sulfoes- 
terase, results in the production of 
quinone indoles (adrenochrome and 
adrenolutin) which interfere with 
cerebral metabolism, To prove this 
assertion that a cerebral toxin which 
is the by-product of some faulty 
metabolic process is the primary 
etiological factor in schizophrenia re- 
quires that a metabolite be isolated 
from schizophrenics which when in- 
troduced into healthy people should 
produce schizophrenia; or, the toxic 
metabolite should be found only in 
those people afflicted with schizo- 
phrenia; or, the toxin should be found 
in larger quantities; or, greater sensi- 
tivity to the metabolite by schizo- 
Phrenics must be demonstrated. The 
next step in their analysis required 
the isolation of one metabolic fault 
out of the myriad metabolic processes 
that characterize life. They ap- 
proached this problem by emphasiz- 
ing the reported relationship be- 
tween certain classes of chemical 
compounds and disordered behavior. 
Specifically, the “hallucinogens” had 
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the property of producing in healthy 
people affects judged similar to the 
symptoms of schizophrenia. It 
should be emphasized that these 
theorists did not equate schizo- 
phrenia with the state produced by 
LSD (lysergic acid diethylamide) or 
mescaline. They viewed the hallu- 
Cinogens as capable of engendering a 
behavioral complex which could serve 
as a ‘model psychosis.” The last 
step in the theoretical formulation re- 
quired the identification of some 
endogenous metabolite which had a 
chemical structure similar to that of 
mescaline and the theorists pointed 
to adrenaline. However, as adrena- 
line is an essential neurohumor, it 
was proposed that derivatives of 
adrenaline produced as a result of 
faulty metabolism may serve as the 

M-Substance,”” the mescaline-like 
toxin of schizophrenia. Axelrod 
(1957) was soon to provide a clue as 
to the manner in which the substance 
could be formed when he found that 
methylation of the phenolic groups of 
adrenaline took place in vivo. Res- 
nick, Wolfe, Freeman, and Elmedilah 
(1958) reported that adrenaline is al- 
most entirely detoxified by the proc- 
ess of O-methylation in human be- 
ings. These chemical findings tended 
to support the speculations of the 
early theorists that faulty metabo- 
lism of adrenaline, possibly excessive 
methylation, could produce the toxin 
in schizophrenics. 

The quest for the exact description 
of “M-Substance” was given further 
impetus when it was claimed that 
‘pink adrenaline” could produce 
psychotic-like reactions in healthy 
people when injected. “Pink adrena- 
line,” it should be explained, is the 
resultant mixture of auto-oxidized 
products of adrenaline obtained when 
adrenaline is left exposed to light and 
air for some period of time. It was 
now suggested that one of the 
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oxidized products such as adreno- 
chrome or adrenolutine (trihydroxy 
N-methyl indole) or some other com- 
ponent of the oxidized mixture was 
the endogenous toxin, the “M-Sub- 
stance.” In the presence of many 
stabilizing substances in the blood, 
however, it is difficult to conceive of 
auto-oxidation of epinephrine in vivo. 
Experiments have been performed in 
an effort to identify the active com- 
ponent of the oxidized mixture which 
is responsible for the production of 
disturbed behavior. Obviously, it is 
to be expected from the foregoing 
that the injection of some toxin of the 
oxidation products of adrenaline 
should produce aberrant behavior in 
healthy people, or, the toxin should 
be found in schizophrenics and not in 
healthy people, or a significant quan- 
titative difference should be found. 
Hoffer (1957) published observations 
on adrenolutine, an isomer of adreno- 
chrome which is more stable. He tried 
to determine which persons had been 
adrenolutine and which the 
placebo with equivocable results. 
Gastaldi (1957) attempted to ascer- 
tain the effect of the more highly re- 
active, unstable adrenochrome and 
reported the absence of behavioral 
abnormalities. In a recent study 
(Holland, Cohen, Goldenberg, Sha, 
& Leifer, 1958) equilibrium plasma 
concentrations of adrenaline and 
noradrenaline were measured and no 
significant difference between healthy 
people and schizophrenics with re- 
gard to the rate of utilization of cir- 
culating adrenaline and noradrena- 
line was found. Axelrod (1958) at- 
tempted to detect the presence © 
adrenochrome in schizophrenics an 
found that it was absent in schizo- 
Il as healthy people. 
1957) deduced from his 
theory that chemical interference in 
the production of epinephrine from 
methylation of norepinephrine shoul 


given 
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effectively reduce the amount of 
adrenochrome and thus lead to 
significant improvement in psychiat- 
ric patients. As nicotinic acid and 
nicotinamide are potent methyl 
group acceptors in vivo, he adminis- 
tered these compounds, and reported 
considerable success in the treatment 
of acute schizophrenics and little or 
no success with chronic schizo- 
phrenics. These results do not neces- 
sarily substantiate the theory. 
The disappointing results of these 
experiments have been rationalized 
by the theorists in at least two ways. 
First, it has been argued that so un- 
stable and biologically active a ma- 
terial as adrenochrome if introduced 
into the blood stream may not sur- 
vive unchanged by the time it 
reaches the brain. Second, the point 
is made concerning the relevance of 
the blood-levels of the various endog- 
enous metabolites to the activity of 
these toxins in the brain. For, if 
adrenaline and noradrenaline are 
neurohumors, then the substances re- 
quired for their synthesis and 
tion are also in the brain. 
metabolism of these catechol amines 
could produce toxins locally which 
would not be reflected in the blood 
because the toxins could not pass the 
blood-brain barrier, 
The evidence in support of the 
prominent role ascribed to adreno- 
chrome is based on some physiolog- 
ical observations on animals and in- 
direct clinical evidence, Schwarz, 
Wakin, Bickford, and Lichtenheld 
(1956) reported that intraventricular 
injections of adrenochrome produced 
a drowsy state in monkeys with man- 
ifest heightened thresholds to painful 
stimuli. Walaszek, Smith, and Minz 
(1958) found that the serum of 
schizophrenics either abolished or re- 
versed the systemic vasopressor re- 
sponse produced by topical applica- 
tion of adrenaline to the exposed 


destruc- 
Faulty 


cerebral cortex of the rabbit, whereas 
the serum of healthy people had no 
such effect. Indirect clinical evi- 
dence to support the role of adreno- 
chrome in schizophrenia was pro- 
vided by Lea (1955). He inferred 
that since adrenochrome is antihis- 
taminic in action, if schizophrenics 
have abnormally excessive quantities 
in their circulation, then they should 
be more resistant to allergic condi- 
tions than healthy people. In his 
study of military schizophrenics, us- 
ing head-injured soldiers as controls, 
he found a highly significant defi- 
ciency of allergic reactions among the 
schizophrenics. 

In summary, the theory which pos- 
tulates the existence of an “M-Sub- 
stance,” an endogenous metabolite of 
adrenaline produced by a deficiency 
in the process of methylation, as yet 
lacks the empirical verification for 
the criteria requisite for confirmation 
of the theory as established by the 
early theorists. Evidence is still 
forthcoming that the production of 
schizophrenic symptoms can be elic- 
ited in healthy people by a metab- 
olite that is isolable from schizo- 
phrenics, or, that the metabolite is 
found solely in schizophrenics, or at 
least in significantly larger amounts 


in them as compared to healthy 
people. 


ANTIMETABOLITES OF SEROTONIN 


While the previous biochemical 
theory of schizophrenia, it would 
seem, was established by an observa- 
tion of the similarity in chemical 
Structure between an hallucinogen, 
mescaline, and the neurohumors, 
adrenaline and noradrenaline, the 
second theory here to be considered 
was formulated after the observation 
was made that the most powerful 
hallucinogen, LSD, was a potent pe- 
ripheral antagonist of serotonin (5- 
hydroxytryptamine), The role of 
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serotonin as a chemical mediator of 
neural activity in the brain has been 
described by Brodie and Shore (1957). 
The clue as to its important role came 
from its discovery in the brain. Like 
acetylcholine, the substance is pres- 
ent in nervous tissue in a precursor 
state and is active only in an un- 
bound form. The amine is unevenly 
distributed in the brain, its concen- 
tration being highest in the brain- 
stem, especially the hypothalamus, 
lowest in the cortex, and almost un- 
detectable in the cerebellum. The 
high biological activity of this amine 
strongly suggests an important func- 
tion in chemical mediation in sub- 
cortical centers. Additional support 
for a role of serotonin in neural 
transmission comes from considera- 
tion of the distribution of the enzyme 
monoamine oxidase, the enzyme that 
destroys serotonin, and 5-hydroxy- 
tryptophan decarboxylase, the en- 
zyme that synthesizes serotonin, 
both of which are found in highest 
concentration in the hypothalamus. 

Woolley and Shaw (1954) postu- 
lated that the hallucinogenic effect of 
LSD might be due to its interference 
with the action of serotonin cen- 
trally, in the brain. An excess 0 
serotonin or a deficiency may lead to 


transmission dysfunction and con- 
comitant behavioral disorder. Ac- 
cording to Woolley and Shaw, the 


action of LSD, as well as several other 
alkaloids, can be ascribed to the in- 
dole moiety in the structures of L 

and serotonin. Consequently, com- 
pounds containing the indole ring 
may act as antimetabolites to sero- 
tonin. LSD is enough like serotonin 
in structure to be taken UP by the 
serotonin receptors in the brain in 
liew ‘of serotonin. ‘This suggests that 
mental aberrations are produced by 
an inadequate amount of serotonin 
at its site of action. However, these 
same workers (Shaw & Woolley, 


379 


1956) also showed in a later paper 
that LSD could act like serotonin in 
potentiating its effect on the blood 
pressure of anesthetized dogs. This 
then suggested the possibility that 
there may be an excessive concentra- 
tion of serotonin in other forms of . 
mental disease. 

Several studies can be cited which 
tend to support the theory that inter- 
ference in the activity of serotonin in 
the brain (possibly by an endogenous 
antimetabolite containing the indole 
nucleus) is related to psychotic be- 
havior. In one study (Zeller, 1958) it 
was found that after the administra- 
tion of large quantities of tryptophan 
to schizophrenics and healthy people, 
the schizophrenics excreted signifi- 
cantly less 3-hydroxyindolacetic acid 
than the normals. This acid is a 
break-down product of serotonin 
which in turn is a derivative of 
tryptophan. 

The serotonin hypothesis is further 
favored by the fact that reserpine, a 
tranquilizer which contains an indole 
ring, depletes the brain of its store 
of serotonin and its tranquilizing 
effect parallels the time course of this 
depletion and not the time course of 
the presence of reserpine in the brain 
(Shore, Pletscher, Tomich, Carlsson, 
Kuntzman, & Brodie, 1957). It was 
deduced, therefore, that reserpine 
and its active analogues bring about 
tranquilization through interference 
with cellular binding of serotonin. 
Within the framework of the seroto- 
nin hypothesis, Brodie and Shore 
have proposed the following hypo- 
thetical scheme to account for the 
tranquilizing effect of reserpine an 
chlorpromazine an’ the hallucino- 
genic effects of LSD and mescaline. 
They propose that the hypothalamic 
parasympathetic centers are acti- 
vated serotonin-liberating, or 

' nerves, and the sym- 


“serotonergic” i 
pathetic centers by norepinephrine- 
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liberating, or adrenergic nerves. 
Dominance of the parasympathetic 
centers causes sedation or tranquili- 
zation, and that of the sympathetic 
centers, wakefulness. Reserpine, by 
preventing serotonin binding, will 
thus cause constant activation of the 
parasympathetic centers, while chlor- 
promazine, the other potent tran- 
quilizer, would bring about sedation 
by blockade of adrenergic impulses 
to the central sympathetic centers 
and the consequent dominance of 
parasympathetic activity. 
This paradigm permits Brodie and 
Shore to postulate the following 
mechanism of action of the two most 
potent hallucinogenic compounds. 
LSD blocks stimulation of the hypo- 
thalamic parasympathetic centers by 
interfering with the release of seroto- 
nin from the “serotonergic nerves,” 
thus unmasking the action of the 
opposing sympathetic system, Mes- 
caline, on the other hand, stimulates 
the “adrenergic” brain centers of the 
posterior hypothalamus directly, and 
mimics the action of norepinephrine, 
Thus an apparent parallelism be- 
tween psychotic-like behavior and 
increased sympathetic activity pro- 
duced by the hallucinogens has been 
proposed by Brodie and Shore, which 
tends in part to support Hoffer’s 
emphasis upon the role of epinephrine 
and norepinephrine in the etiology of 
schizophrenia. A stricter interpre- 
tation of the Brodie-Shore hypotheses 
would restrict the effect of the 
tranquilizers and the hallucinogenic 
compounds to levels of activity, i.e., 
sedation, wakefulness, without any 
reference to psychotic behavior, For, 
at no time do the authors attempt to 
systematically relate the increased 
wakefulness resulting from excessive 
stimulation of adrenergic activity to 
psychotic or psychotic-like states. 
The only relationship that may be 
logically deduced from their writing 


is that psychotics are wide-awake. 

Although the foregoing hypotheses 
regarding the role of serotonin in 
mental function seem attractive, 
there are some cogent criticisms 
which adherents of the serotonin 
paradigm have yet to answer. A close 
analogue of LSD, 2-bromo-D-lysergic 
acid diethylamide (Brom LSD) was 
found by Cerletti and Rothlin (1955) 
to be as effective as LSD in antago- 
nizing several peripheral actions of 
serotonin, yet this antagonist of sero- 
tonin was found to be devoid of hal- 
lucinogenic action in humans. It 
should be noted that chlorpromazine 
is also a potent antiserotonin and 
instead of being hallucinogenic is a 
powerful antihallucinogen. 

Before proceeding to a brief de- 
scription of still another recent bio- 
chemical theory, it would serve well 
to re-emphasize how important a 
role the hallucinogens, specifically 
LSD and mescaline, have played in 
the two previous biochemical formu- 
lations of schizophrenia. The pro- 
ponents of each theory have bor- 
rowed freely from each others’ 
research efforts to substantiate hy- 
potheses or propose biochemical 
nuances. This cross-fertilization im- 
plies that we have two distinct 
biochemical approaches which ulti- 
mately will coalesce into a unified 
biochemical theory of schizophrenia. 
This hope for the future is somewhat 
mitigated by a consideration of the 
clinical effects of LSD and mescaline 
which appear to be significantly dif- 
ferent. Thus Matefi (1952) employ- 
ing himself as a subject to study the 
effects of LSD and mescaline reported 
that they produced different psycho- 
pathologic reactions; the former, one 
of hebephrenic type and the latter 
catatonic. Drawings produced under 
the influence of LSD showed a tend- 
ency to expansion, while the ‘mes- 
caline pictures” showed constriction. 
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Fischer, Georgi, and Weber (1951) 
also reported that LSD produced a 
predominantly hebephrenic state, 
whereas during mescaline intoxica- 
tion catatonic features were out- 
standing. From such experiments in 
which subjects receive both drugs, it 
is a little difficult to see how pre- 
sumably the same biochemical im- 
pairment can produce different ex- 
periences within the subject which 
are also significantly differentiable by 
an observer. These clinical observa- 
tions lead to the obvious suggestion 
that the psychotic-like experience 
elicited by the hallucinogens is not 
homogeneous. Furthermore, as the 
behavioral concomitants of these 
toxic states are clearly differentiable, 
we may be dealing with two distinct 
biochemical processes, two distinct 
detoxification processes that are set 
off by various classes of chemical 
compounds. In short, considering the 
lack of identity between schizophrenia 
and the toxic states produced by 
sundry hallucinogens and the possi- 
bility of heterogeneity in mental dys- 
function as specifically related to the 
class of compound administered to 
healthy people, the author may not 
have been considering biochemical 
theories of schizophrenia but rather 
biochemical theories of chemical psy- 
choses. Future studies will un- 
doubtedly clarify some of these issues. 


INTERFERENCE IN ACETYLCHOLINE 
METABOLISM 


In addition to Hoffer, Pfeiffer and 
Jenney (1957) have emphasized the 
role of brain acetylcholine as a pos- 
sible etiological factor in schizo- 
phrenia. In contrast to Hoffer who 
believes that schizophrenia is charac- 
terized in part by excessive central 
parasympathetic (cholinergic) ac- 
tivity, Pfeiffer and Jenney have pre- 
sented evidence which supports the 
view that schizophrenics are deficient 


in acetylcholine. They base their 
argument on the facts that: (a) many 
tranquilizers have persistent acetyl- 
choline-like effects; (b) drugs which 
show the nicotinic properties of 
acetylcholine, €-8-, di-isopropy! flu- 
orophosphate, make schizophrenics 
worse; and (c) drugs which show 
muscarinic effects, €-8- arecoline, 
make them better. In their study in- 
volving 23 schizophrenics they ad- 
ministered arecoline, a parasympa- 
thetic stimulant that passes the 
blood-brain barrier freely and leads 
to pronounced cholinergic effects, 
and methyl atropine nitrate, a 
quaternary analogue of atropine that 
protects against the peripheral effects 
of parasympathetic stimulants. Re- 
markable changes, it is claimed, en- 
sued within 1-2 min. and lasted for 
15 min. During a lucid interval which 
persisted for 15 min., the patients 
became more talkative, showe' 
greater insight and sociability. 
Recently, Rubin (1958a; 1958b) 
has reported the results of an experi- 
ment that may help to resolve the 
manifest contradiction between the 
theoretical formulations of Hoffer and 
Pfeiffer. It was shown that the 
hydrolysis rate of acetylcholine by 
human erythrocyte cholinesterase 
was significantly different between 
healthy people and psychotics. Fur- 
thermore, it was demonstrated that 
the psychotics were distributed be- 
tween two discrete groups, each of 
which differed significantly from the 
normals with respect to the kinetics 
of enzymatic activity. One group 
hydrolyzed the substrate more 
rapidly than the normals while the 
other group hydrolyzed it more 
slowly. As rapid hydrolysis reduces 
the concentration of free acetyl- 
choline, the former group may be 
assumed to be deficient in acetyl- 
choline, and therefore corresponds to 
the defect postulated by Pfeiffer. 
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Defective enzymatic hydrolysis of 
acetylcholine in the latter group of 
patients may be characterized by 
excessive cholinergic activity, and 
they would deviate from Pfeiffer's 
model while congruent with Hoffer’s 
hypothesis. 


SUMMARY 


A brief review of several outstand- 
ing empirical studies Suggests that 
schizophrenia is characterized by 
some disordered metabolic response 
to stress which in turn is dependent 
upon a neurohumoral or enzymatic 
defect. The three outstanding the- 
ories described have been derived 
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from considerations of the chemical 
and pharmacological properties of the 
hallucinogenic or psychotomimetic 
agents. The fruitfulness of these 
theories and their biochemical con- 
structs seems to be dependent upon 
the validity of the basic underlying 
equivalence relationship presumed to 
exist between the model psychosis 
produced by psychotomimetic agents 
and the pathological state(s) char- 
acterized as schizophrenia. The in- 
tense activity of researchers in this 
area should soon provide the rele- 
vant information required for evalu- 
ation of the basic assumption and 
possibly an effective chemotherapy 
for the schizophrenias, 
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University of Massachusetts 


Statistical interaction has been 
most commonly defined as a measure 
of the joint effect of p variables upon 
performance. If a scaled independent 
variable is involved, the interaction 
sum of squares may be further ana- 
lyzed, yielding more information 
than the above definition would indi- 
cate. This analysis was first pre- 
sented by Alexander (1946), and 
subsequently extended by Grant 
(1956). This paper will present a 
complete analysis of the (a — 1)(6—1) 
degrees of freedom (df) involved in 


action term and to introduce a meth- 


ces about 
nd curva- 


steps), the (b— 1)(a—1) df can be an- 


Each of the four (b—1) mean Squares 
would be on two (a—1) df. The ap- 
propriate error terms for these com- 
parisons have also been presented by 
Grant. 

If A and B are both scaled vari- 
ables there are a number of alterna- 
tive ways of analysing the data. We 
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could, as above, compare the A 
curves over the levels of B. Alterna- 
tively, the B curves might be com- 
pared over the values of A. A more 
efficient analytical approach exists 
which will provide more information 
than can be obtained from perform- 
ance of both the Previously suggested 
analyses, and which will also shed 
light on the relation between the two 
ways of graphing the data, 

The proposed analysis stems from 
One basic fact: if both variables are 
scaled, a sum of Squares on 1 df can 
be computed for each of (a— 1)(b—1) 
components of the interaction sum of 
Squares. Table 1 presents a set of 
data which has been analyzed in this 
manner, The design is a 5X3 fac- 
torial, with four entries in each of the 
15 cells, The analysis may be ap- 
Proached in either of two ways. The 

Curves may be compared with re- 
Spect to each of @—1 components; 
€ach sum of Squares is then further 
analyzed into 5 — 4 components. This 
has been done in the left half of Table 

Alternatively, the A curves may 

be compared with respect to each of 
—1 components, followed by fur- 
ther analysis of each sum of squares 


Sum of squares, computed in the 
usual manner, is 151.067, 
analysis requi 


values of orthogonal 
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DATA AND COMPUTATIONAL Alps FOR THE ANALYSIS 


TABLE 1 
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At As As >, Vie De XıYijt So XiFijr 
T E TS 
Je, s eee 
18 17 15 50 - = 
Bı 3 3 1 
19 18 17 54 52 0 
16 16 15 47 i zj 
Bı Totals 73 69 63 205 —10 -2 
18 15 14 47 -4 2 
18 16 13 47 T5 =i 
By 
17 14 14 45 -3 3 
16 13 13 42 -3 3 
B; Totals 69 58 54 181 —15 7 
16 12 11 39 -5 3 
18 13 14 45 -4 6 
Bs 
17 12 13 42 -4 6 
16 10 12 38 -4 8 
B; Totals 67 47 50 164 —17 23 
15 5 6 26 -9 11 
18 8 8 34 —10 10 
By 
17 7 9 33 —8 12 
17 5 5 27 —12 12 
B, Totals 67 25 28 120 —39 45 
17 7 6 30 —11 9 
18 10 9 37 -9 7 
B; 
18 9 8 35 —10 8 
15 8 T 30 — 8 6 
Bs Totals 68 34 30 132 —38 30 
Totals 344 233 225 802 —119 103 


386 JEROME L. MYERS 
(Fisher & Yates, 1953). For4dfthe In our example 
values are: 
Zı= 2 —1 0 +1 +2 
Zoa=+2 -1 —2 -1 42 


SSB_tinear qd) 


= [(—2)(—10)+(—1)(—15) 


mete 8-4 + (0)(—17)+(1)(—39) 
EES a +6 —4 +1 +(2)(—38) ]2/(4) (10) (2) 
a= = 
= 80.000 
For 2 df the values are: 80.00 
il OR The SSp_tinear2) would be calculated 
x, aril 2 +1 by using value of Zə, rather than Zi, 
= L 


in Equation [2]. Thus 
The subscript “1” refers to linear, 


“2” to quadratic, “3” to cubic, “4” SSB-tinear (2) 
to quartic. The sum of squares for = [(2)(—10)+(—1)(—15) 

the comparison of the linear compo- a ee E NE 
nent of the B curves may now be cal- +(—2)(—17)+(—11)(—39) 


erated +(2)(—30)]*/(4)(14)(2) =0.571 
ERa. e 2 
SSp-tinear= >> ( Dope XY) [n DX 
i Pk 
b n a 2 
w (> SBOE XV) fon DIE [1] 
Ki i k 
where ae Values of Xə, rather than Xs, 
i a pas to scores within cells, are utilized in the calculations of 
eA on, SR ouadare i iS. 
J refers to levels of Byj=1,2)6-. | Praadetie and its components 
k SSE niakd 
k refers to levels of 4, k=1, 2, . wad =|(—2)2 2 a 
a. In our example n=4, a=3, and K eau ga ee 
b=5. The calculations are + (30)?]/(4) (6) — (103)2/(5) (4)(6) 
SSp-tineer=[(—10)*4 (—15)24 (—47)2 TaN 
+(—39)+(—38)2]/(4)(9) SS B-quadratio: (1) 
O =93. 359 =[(-2)(—1) +(-1(7) 4.0) 23) 
This quantity is on 4 df and may FOVAS)-+ 2) (30) }2/(4)(6) (10) 


now be analyzed into 4 components = 43.350 

each on a single df. We have called By substituting values of Zi Za 
these components B-linear (1), B- and Zs for Z, SS pec, fi 
linear (2), etc. The calculations fol- , — quadratic 


l be ects and SS'p_quadratiets) can 
Ow. e computed. 


b n 


a 2 
SS B—linear (1) = z 2s ZıXı Yin) /n > Zi Se [2] 
aril 


j i 
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TABLE 2 
$ Two APPROACHES TO THE ANALYSIS OF THE INTERACTION 
com a 
AXB 8 151.067 AXB 8 151.067 
B-linear 4 93.350 A-linear 2 123.350 
B-linear (1) 1 80.000 A-linear (1) 1 80.000 
B-linear (2) 1 0.571 A-linear (2) 1 43.350 
B-linear (3) 1 5.000 A-quadratic 2 5.821 
B-linear (4) 1 7.779 A-quadratic (1) 1 0.571 
B-quadratic 4 57.717 A-quadratic (2) 1 5.250 
B-quadratic (1) 1 43.350 A-cubic 2 13.067 
B-quadratic (2) d 5.250 A-cubic (1) 1 5.000 
B-quadratic (3) 1 8.067 A-cubic (2) 1 8.067 
B-quadratic (4) 1 1.050 A-quartic 2 8.829 


The mean squares on single df’s are 
measures of changes in slope and 
curvature of the B curves, over the 
levels of B. As an illustration, as- 

J sume that we are interested in deter- 
í mining the function which describes 
the relation of dark adaptation rate 
(i.e: the slope of the dark adaptation 
curve) to preadapting light intensity- 
If we let B stand for intensity and 

for trials, the SSz—tinear@) would en- 
able us to test the hypothesis that 


rate of dark adaptation shows a 


linear decrement with increases in in- 
) would en- 


tensity. The SSp—tinear@ 
able us to test the hypothesis that 
rA the function relating rate of dark 
| adaptation to intensity hasa quadrat- 


aoe 


ic component. The SSp—auadratie@) 
term is a measure of the extent to 


A-quartic (1) 1 7.719 


A-quartic (2) 1 1.050 


which the plot of the quadratic co- 
efficient of the dark adaptation curve 
shows a linear change as a function 
of intensity. The Grant analysis 
yields a test of the hypothesis that 
the slopes (or quadratic, cubic, etc. 
components) of a number of curves 
do not differ. If they do differ, the 
analysis under discussion yields in- 
ferences about the way in which the 
component changes from curve to 
curve. 

The analysis of the data in Table 1 
was carried out by first computing 
the B-linear and B-quadratic compo- 
nents, then analyzing each of these 
into four components. We might 
have, as easily and meaningfully, 
computed four A components, 
analyzed each of these into two com- 
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nents. The results of these anal- 
aes are shown in Table 2. Note 
certain relationships between the two 
computational approaches. For ex- 
ample, B-linear (2) is equal to A- 
quadratic (1), B-quadratic (4) is equal 
to A-quartic (2). The jth component 
of the kth component of A will always 
equal the kth component of the jth 
component of B. Thus tests of hy- 
potheses about the B-curves may be 
generated after an analysis of the 4- 
curves by simply regrouping the 
single df components, and then add- 


ing. For example, B-linear is the 
sum of A-linear (1) and A-quadratic 
(1). 


A discussion of error terms still re- 
mains. There are three possible cases: 
I. Each S is measured once under 
only one combination of A and B. 
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II. There are different Ss at each lev- 
el of B, but all Ss are measured at all 
levels of A (or B may be the within- 
Ss variable, and A the between-Ss 
variable). III. All Ss are measured 
under all combinations of A and B. 

Table 3 presents the error terms 
(denominator of F) for all interaction 
components, in each case. 

Case I. The error term for all 
terms in Table 2 is the within-cells 
mean square. This value is 1.267. 
There are ab(n— 1) df or, in this case, 
45. The within-cells sum of squaresis, 
as always, the total sum of squares 
minus the between-cells sum of 
squares. 

Case II. Assume that each row in 
Table 1 represents a different S. The 
AB interaction js tested against 
a SsXA/B (subjects-by-4-within-B) 


TABLE 3 
ERROR TERMS FoR THREE DESIGNS 
Case I Case II Case III 
Numer-  Denomi- 
ator nator Numerator of F Denominator of F Numerator Denominator of F 
Allterms Withi 

in e aein cae AB SAB (.538) 

ble 2 (1.267) A-linear B-linear error—B-linear (.738) 
A-quadratic}Ss XA/B (.550) B-quadratic — error—B-quadratic (.340) 
A-cubic A-linear 


A-quartic 


error—A-linear (.197) 


wa a a 
A-quadratic ¢rror—A-quadratic (.903) 
B-linear and all 


its components 


A-cubic error—A-cubic (.456) 


Adlinear (1) A-quartic €rror—A-quartic (.600) 
7 x F B-linear (1 
A-quadratic (1) ferror-linear (.742) A-linear p} error—B-linear (1) (.333) 
A-cubic (1) B-linear (2) } 
A-quadratic (1) error—B-linear (2) (1.000) 
A-quartic (1) 


B-linear >} 
A-cubic (1) error—B-linear (3) (.833) 
B-quadratic and all z 


A-linear (2) 


A-quadratic (2) 
A-cubic (2) 
A-quartic (2) 


error-quadratic (358) 


} error—B-linear (4) (.783) 


B-quadratic (1) 
-linear (2) 


A-quartic (1) 


ae (1) (.061) 


juadratic (2) 
Agate (2) 


B-quadratic (3 
A-cubic on n 


Dacre T (2) (.806) 


€rror—B-quadratic (3) (.078) 
B-quadratic (4) } 
A-quartic (4) Jerror—B-quadratic (4) (.417) 


a 
` a, eee 


y eee 
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error term on b(n—1)(a—1) df (30 
df). The computational formula is: 


SSsoxarp =SSrotn —SSss—SSa—SSaB 
= 16.500 [3] 


The mean square of .550 is the error 
term for A-linear (2 df), A-quadratic, 
A-cubic, and A-quartic, as well as for 
the AB interaction. 

The linear component of error 
(error-linear) is computed from 


ee 
EE (gor) / Be 
=[(-4)2+(-3)*= 
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all its components, and for A-linear 
(2), A-quadratic (2), A-cubic (2), and 
A-quartic (2). 

Case III. Assume that the first 
row at each level of B (Table 1) rep- 
resents the performance ofa single S, 
that the second rows represent a sec- 
ond S, etc. Thus we have four Ss go- 
ing through 15 combinations of A 
and B. ‘Table 4 presents various 
sums of crossproducts which should 
facilitate calculations of the error 


b n a 2 
B(2d xa) / EX! 
i k 


i 


Le (= 10)2+(—8)"1/2 


—[(—10)2(—15)?+ ° +(—38)"]/(4) 2) 
[al 


=11.125 


The corresponding mean square 
(df=b(n—1) =15) is .742. This is the 
error term for B-linear and all its 
components, and for A-linear (1), A- 
quadratic (1), A-cubic (1), and A- 
quartic (1). 

The quadratic component of error 
(error-quadratic) is given by Equa- 
tion [4] with values of X: substituted 
for X;. The sum of squares is 5.375; 
the mean square is .358 on 15 df, and 
is the error term for B-quadratic an 


SS error—B—linear 


TEE( Gare) ES 


ee a 
gle df components. 

for AB is SAB, and 
he usual manner. 


terms for the sin 
The error term 
is computed in t 
The resulting sum of squares 1S 
12.933, which, when divided by 
24 (af =(n—1)@—DO—D) yields a 
mean square of 538. The SAB sum 
of squares may next be analyzed 
into a—l components, each on 
(n—1)(b-1) df. In our example we 
have an error-B-linear and an error- 
B-quadratic, each on 12 df. 
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INTERACTION OF TWO SCALED VARIABLES 


The mean square is .738 and is the 
error term for B-linear. The sum of 
Te for the error-B-quadratic is 
S tained by substituting Xə values in 
5 quation [5]. The resulting sum of 
a is 4.083, and the mean square 
A 0, the latter being the error 
ae for B-quadratic. The SAB 
i m could also have been ana- 
na into b—1 components, each on 
oe} df. Equation [5] would 
be contain Z values rather than X 
ues, and would yield the sums o 
squares for error-A-linear, error-A- 
quadratic, etc. These in turn would 
provide F tests for A-linear, A- 
quadratic, etc. 
Pes consider the error terms for 
ee and B-quadratic components. 
e sum of squares for error-B-linear 
(1) is given by 


22 [(=19) ee +(- 


=1.00 


The mean square on n— 1 (=3) dis 
.333 and is the error term for B-linear 
(1). The remaining error components 
(mean squares) are found in Table ce 
and were calculated in similar fashion, 
by substituting appropriate values 0 

X and Z in Equation [6]- Relation- 


ships previously pointed 
raction hold for 
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example. This indicates that error- 
A-linear, error-A-quadraitc, etc. can 
be obtained by adding appropriate 
error-B components. For example, 
error-B-linear (1) and error-B-quad- 
ratic (1) should equal error-A-linear. 

The analyses described permit in- 


ferences about such matters as the 


rate of learning as 2 function of 
amount of practice, the rate of dark 
adaptation as a function of pre- 
adaptation intensity, or the rate of 
extinction as 4 function of number of 
conditioning trials. While such infor- 
mation may appear meaningful and 
useful, the reader may wonder if tests 
of quantities such as B-quadratic (4) 
are of any utility. To this, it may be 


hat the complete single df 


answered t 
analysis should extend our under- 
standing of the interaction term. 


n b a 2 
SS iror O OY ( DB 25 ZXY) / > xr DY Zia SS p iiaea 
j k 


23)2]/(2)(10) — 80.000 [6] 


calculation of addi- 
provides computational 


are of major interest. Finally, it is 


hoped that t 
from such ana 
basis for more qu 
and accurate pred 
havior than our pr 


aa to the AB inte : 
SAB. Thus, error-B-linear 9) is erally yield. 
identical to error- A-quadratic (1), for 
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MULTIPLE COMPARISONS IN ANALYSIS OF VARIANCE 


JOHN GAITO 
Wilkes College 


Recently Ryan (1959) has Provided 

a valuable service for psychologists 
by presenting newer techniques for 
making multiple comparisons after 
the analysis of variance F test has 
rejected the over-all null hypothesis 
that the means are equal. Some of 
these techniques previously were not 
available to psychologists because 
they were discussed in scattered 
sources. However, Ryan does not 
mention the fact that the analysis of 
variance technique does provide a 
means of making individual com- 
parisons (after rejection of the null 
hypothesis) which js suitable for 
many situations. This procedure 
involves Partitioning the z degrees of 
freedom for the main effect into n 
orthogonal components, each with 
one df (Edwards, 1951; Senders, 
1958; Snedecor, 1946) and is suitable 
for either one variable or multiple 
variable designs, 
As an example of this procedure 
let us take the case in which we have 
three groups of Ss. Group 1 is a con- 
trol; Groups 2 and 3 are experimental 
groups. Inasmuch as three groups are 


involved, two df are present, There- 
fore, we can partiti 


squares with two df 
components, each with one df. The 
meaningful comparisons in this situa- 


+Gs, and G, 


As another example let us have two 
experimental groups (E; and E») and 
a control group for each (C, and Co). 
Here we have a sum of squares for 
between groups with three df. We 
partition these three df into three 
single components, each with one df. 
The experimental hypotheses would 
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determine which comparisons would 
be relevant. In this case it appears 
that the investigator would be con- 
cerned with a comparison between 
(Ci+E;) and (C2+E3) for one df. A 
second df would involve the com- 
Parison, C, vs. E;; the third df, C2 vs. 
Ez. Each of these three tests would 
be by the F ratio, the denominator 
being that which is provided by the 
analysis of variance model and would 
be the same for all three tests. This 
Procedure allows one to make use of 
the power of the variance analysis 
technique (a well-explored robust 
Procedure) rather than to develop 
new procedures. Thus the proba- 
bility statements would be exact and 
the error rate per comparison would 
be that of the Probability level used 
for the tests of significance, as is 
usual with the analysis of variance 
Procedure. This would maintain the 
error rate per experiment at a lower 
tate than would occur if all possible 
comparisons were effected. : 
With the Partitioning procedure it 
is important that the orthogonal com- 
barisons be planned before the experi- 
ment is conducted. In the second 
example, if the investigator had not 
Planned the comparisons, he might 
be tempted to make different or- 
thogonal Comparisons than those 
above after he looks at the data. For 
example, he might compare (E;+E2) 
and (C+C, » Ei vs. Es, and & vs.C2. 
Owever, which orthogonal compari- 
sons are to be effected will be de- 


1958.) 
Using this Procedure, Ryan’s Case 


u 
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1 (multiple comparisons) reduces 
essentially to his Case 3 (multiple 
variables in analysis of variance). A 
single variable might be involved but 
the partitioning into n components 
gives the analysis the appearance of a 
multiple variable design, according 
to procedure in making tests of sig- 
nificance. Thus, two or more F ratios 
will be involved and, in a strict sense, 
these ratios will not be independent 
because a common error estimate will 
be used. However, this lack of inde- 
pendence is irrelevant. The impor- 
tant point is that the numerator and 
denominator of the F ratio be inde- 
pendent. Each F ratio is based on 
the mathematical model (the F dis- 
tribution) and has all the power of 
that model. 

The above examples indicate the 
suitability of the partitioning pro- 
cedure for many cases. However, in 
some experiments the partitioning 
procedure will not be completely 


efficacious inasmuch as all compari- 
sons which are required by the ex- 
perimental hypotheses may not be 
orthogonal. Yet if most comparisons 
are orthogonal ones it would appear 
that little loss of exactitude would 
occur with few nonorthogonal com- 
parisons. Sometimes the results of 
the orthogonal comparisons will pro- 
vide indirect information concerning 
the relative ranking of all different 
groups, thus obviating the need for 
making nonorthogonal comparisons. 
Butin this situation the new multiple 
comparison techniques which Ryan 
discusses might be used. However, it 
should be pointed out that a trend 
analysis (Lindquist, 1953) might be 
more appropriate if the groups differ 
quantitatively (or qualitatively if the 
groups can be ordered). However, a 
regression analysis may be more 
meaningful if the groups differ in an 
orderly quantitative manner. 


REFERENCES 


EDWARDS, A. L. Experimental design in psy- 
chological research. New York: Rinehart, 
1951. 

Linpguisr, E. F. Design and analysis of ex- 
periments in psychology and education. New 
York: Houghton Mifflin, 1953. 

Ryan, T. A. Multiple comparisons in psycho- 


logical research. Psychol. Bull., 1959, 56, 
26-47. SP 
SenpErs, V. L. Measurement and statistics. 


New York: Oxford, 1958. 
SnEDECOR, G. W. Statistical methods. Ames: 


Iowa State Coll. Press, 1946. 
Received February 12, 1959. 


SYCHOLOGICAL BULLETIN 
Vou 56, No. 5, 1959 


COMMENTS ON ORTHOGONAL COMPONENTS 


T. A. RYAN 
Cornell University 


Professor Gaito is to be thanked 
for pointing outa method for making 
multiple comparisons which was not 
discussed in my paper. Long as that 
paper was, it still could not cover 
all of the possible approaches. The 
method of orthogonal components 
which Gaito proposes has, however, 
several serious drawbacks. Moreover, 
the power of the method is little, if 
any, greater than that of other meth- 
ods which are not subject to these 
difficulties. 

The following points must be kept 
in view in evaluating the method ad- 
vanced by Gaito: 

1. By restricting ourselves to or- 
thogonal comparisons, we are pre- 
vented from making comparisons 
which may be important in under- 
standing and interpreting the results. 
In his first example involving two ex- 
perimental groups and a single con- 
trol, Gaito asks two questions: (a) 
whether the two experimental groups 
differ jointly from the control group, 
and (b) whether the experimental 
groups differ from each other. If we 
ask these two questions we cannot 
then ask which of the experimental 
groups differ from the control, 

2. When the number of groups is 
larger than that in Gaito's example, 
we are even more restricted in the 
questions which we can ask. For 

example, if there are 5 experimental 
groups and 1 control, we have a total 
of 5 degrees of freedom and therefore 
5 orthogonal comparisons are possi- 
ble. There are only 4 degrees of free- 
dom for question (b), so we can make 
only 4 of the 10 possible comparisons 
among the experimental groups. 
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Dunnett (1955) has provided a 
method for comparing each experi- 
mental group with a control while 
controlling the error rate experiment- 
wise. This method makes the same 
number of comparisons as the method 
of orthogonal components, but the 
comparisons it does allow are more 
meaningful for many common situa- 
tions. If we use Dunnett’s method 
we cannot find out which of the ex- 
perimental groups differ from each 
other. To ask these questions brings 
us to the situation where we make all 
possible comparisons among the 
means, and a method like Tukey’s is 
the most appropriate. 

3. The relative power of two meth- 
ods of comparison can be determined 
adequately only if the error rates are 
computed on the same basis. Dun- 
nett’s and Tukey's methods are de- 
signed to control the error rate 
experimentwise. To compare these 
with the method of orthogonal com- 
parisons it is necessary to determine 
the experimentwise error rate for the 
latter. When this is done, as has been 
illustrated in Table 1, we find that 
the Dunnett method actually re- 
quires slightly smaller differences for 
Significance than does the method 
Proposed by Gaito.1 (We must bear 
in mind, of course, that the Dunnett 
method is not comparing the same 


1 In order to determine allowances for the 
method of orthogonal components in Table 1, 
the F (or #) value was determined at the 
tabled “.025 level.” Since there are two com- 
Parisons in the example, the error rate per 
experiment is twice the Nominal level of the 
single tests, or .05. The experimentwise rate is 
only slightly less (04994), 
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TABLE 1 
COMPARISON OF ALLOWANCES FOR 5% LEVEL EXPERIMENTWISE OR PER 
EXPERIMENT, TWO EXPERIMENTAL GROUPS, AND OnE CONTROL 
Critical ¢ for 
a F* for hss Critical £ for Tukey Method*** 
if for Error Orthogonal t=/F Dunnett 
Com t / ss j 
mponents Method Extremes Adjacent 
Means 
15 6.20 2.49 2.46 2.62 2.41 
30 5.57 2.36 2.32 2.47 2.29 
o 5.02 2.24 2.21 2.35 2.17 
* Read from F tables for “025 level” which gives .05 level per experiment, with two comparisons. 
crimentwise. 


ae Allowance for comparing each ‘of two experime: 
For all comparisons at .05 level expe 


| pairs as the method of orthogonal 
comparisons, so we are comparing the 
im _ power of one in making its compari- 
sons with the power of the other in 
making its own.) Tukey’s method, 
which permits us to make all possible 
comparisons of pairs of means, Te- 
zA quires a somewhat greater gap when 
{ we are comparing the extreme means, 
but a smaller separation for adjacent 
pairs. 
4. When it has been decided in 
advance that only certain compari- 
sons are of interest, and that other 
comparisons will never be made un- 
k der any circumstances, multiple com- 
parison procedures can be adapted to 
control the error rate per experiment. 
To do so, it is not necessary to ensure 
that all comparisons are orthogonal. 
The error rate is simply the product 
of the number of comparisons to be 
made and the nominal significance 
level of the individual tests (the error 
rate per comparison). This relation 
is not affected by the lack of inde- 
pendence of the comparisons (Ryan, 
1959, p. 39). 

Allowances based upon experiment- 
wise rates may be somewhat smaller 
than those for the corresponding rates 
per experiment. This is the reason for 

é the difference between Dunnett's al- 
| lowances and those based upon F in 


with control mean, at .05 level ex? 


rimentwise, computed from Tukey's tables. 


Column 3 of Table 1. The allowances 
for the Tukey method would have 
been slightly larger if veges com- 
puted them on the basis of rates per 
experiment. For example, the al- 
lowance for the extremes in the case 
of 15 degrees of freedom would have 
been 2.73 instead of 2.62, but this is 
the largest change which would have 


been made in the table. When we se- 
s, however, @ 


lect our comparison 
method which controls the experi- 
mentwise rate is usually not readily 
available. What we may lose by the 
shift to the rate per experiment may 
nevertheless be compensated for by 
the reduced number of comparisons. 
5. In describing his procedure, 
Gaito has used the error rate per com- 
parison. This is also the rate per de- 
gree of freedom since the number 0 
comparisons equals the degrees of 
freedom. In my paper, however, 
argued that we should control the 
rate of error per experiment or ex- 
perimentwise. shall not review 
these arguments here, since Gaito 
has not offered any reasons for pre- 


ferring the rate per degree of freedom. 
if we were convinced that 


Even ! 
the error rate should be based upon 
the number of degrees of freedom, 


there are other more satisfactory 
methods available. Duncan's meth- 
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ods (1955) permit us to make all 
possible comparisons with an error 
rate which is proportional to the 
number of degrees of freedom. The 
Duncan test will require slightly 
greater separation of a pair of means 
for significance, in comparison to 
those in Gaito’s method, but this 
loss of power is more than compen- 
sated for by the possibility of asking 
more meaningful questions in our 
analysis of the data. 

Apart from this analysis of the 
method proposed by Gaito, I wish to 
correct a misunderstanding which 
may arise from his first sentence. The 
newer methods for multiple com- 
parisons,do not require an initial F 
test of the over-all variation between 
groups. All of the methods men- 
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tioned in the present note can be 
applied immediately, without such a 
preliminary test. . 
Having raised a number of points 
of disagreement, I can nevertheless 
end with an important area of agree- 
ment, namely the possibility of using 
regression or trend analysis in cases 
where the groups can be ordered or 
classified according to a quantitative 
independent variable. This whole 
approach was not considered in my 
paper because of limitations of space 
and because it is generally much 
better understood than the cases of 
qualitative groupings. At least, in 
regression analysis, the number of 
significance tests does not increase 


geometrically as the number of groups 
or subdivisions goes up. 
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In “Multiple Comparisons in 
Research,” Psych, Bull., Vol, 
26-47, there is an error in th 
sentence of the appendix. 
should read: 


“Before M; can differ significantly from 
Mu, both M; must be found to be significantly 
different from Ms, and M, must differ sig- 
nificantly from Mz.” 


Psychological 
56, No. 1, pp. 
e next to last 
This sentence 
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ERRATUM 


In other words, a given pair of means can- 
not differ significantly in the layer method 
unless all larger subgroups to which it belongs 
are also significant, When a particular sub- 
group is found to have a nonsignificant range 
no further tests are made within it, just as the 
total range must be significant for any further 


ip (This point is correctly stated on page 


n: 


Di 
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RECENT RESEARCH ON HUMAN PROBLEM SOLVING” 


CARL P. DUNCAN 
Northwestern University 


The present review summarizes 
most studies of human problem solv- 
ing that were published in the period 
1946 through 1957. A complete re- 
view of the literature on human prob- 
Jem solving would have to include 
Študies in which problem solving 


tasks were used in research on the 


subject variable of rigidity. Since 
Chown (1959) has recently published 
an extensive review of rigidity, the 
studies of problem solving which are 
cited in her paper will not be covered 
here. In the case of topics where 
Chown has summarized some of the 
relevant studies, her paper will be 
cited along with other pertinent in- 
vestigations. 

Within the area of thinking, the 
present review covers only experi- 
mental and theoretical studies that 
dealt with the problem solving per- 
formances of normal human adults. 
Thus, the scope of the paper is nar- 
rower than that of other recent re- 
views (Humphrey, 1951; Johnson: 
1950, 1955; Russell, 1956; van de 
Geer, 1957; Vinacke, 1952). 


DEFINITIONS 


Attempts to define thinking in gen- 
eral or problem solving in particular 
appear most clearly in the writings 
of Humphrey (1951), Johnson (1955), 


1 The work for this paper was supported in 
part by the Northwestern University De- 
partment of Psychology-School of Education 
Carnegie Project. 


Maltzman (1955), Ray (1955), Rus- 
sell (1956), Underwood (1952), van 
de Geer (1957), and Vinacke (1952). 
The defining characteristics most fre- 
quently mentioned are the integra- 
tion and organization of past experi- 
ence when the definition refers to all 
of thinking, and the dimension of dis- 
covery of correct response when ref- 
erence is made to problem solving 
specifically. Problem solving is con- 
sidered to be fairly high on the dis- 
covery dimension, as one way of dis- 
tinguishing it from conditioning and 
rote learning which are presumed to 
involve relatively little response dis- 
covery. Underwood (1952) gives 
three methods for determining the 
amount of overlap between condi- 
tioning and thinking. 

It is of interest to note that nearly 
all writers concerned with definitions 
emphasized that they were trying to 
define thinking or problem solving in 
such a way as to relate them to, not 
separate them from, simpler proc- 
esses like learning Or perception. 
Maltzman (1955) and a few others 
distinguish between productive and 
reproductive processes within think- 
ing, but apparently no one any longer 
seriously defends a sharp distinction 
between higher and lower mental 
processes, particularly between think- 
ing and learning. That issues of this 

letely dead, how- 

is indi d by the fact that van 

de Geer (1957) attempted to destroy 

the productive-reproductive distinc- 
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tion, and several other writers who 
were not primarily concerned with 
definitional problems also felt it nec- 
essary to state that thinking is part of 
learning or association (Cofer, 1957; 
Judson & Cofer, 1956; Judson, Cofer, 
& Gelfand, 1956; Saugstad, 1957; 
Weaver & Madden, 1949). 
A few other writers who have been 
somewhat concerned with problems 
of definition may be mentioned. In 
an extensive study of categorization 
and concept formation, Bruner, 
Goodnow, and Austin (1956) de- 
scribed broad classes of equivalence 
categories, one of which was “func- 
tional” categorization. The authors 
think this category includes at least 
those problem solving tasks where S 
must categorize an object as fitting a 
certain function, e.g., the pliers as a 
pendulum weight in the two-string 
problem, They also suggested that 
defining attributes are sometimes 
combined to create either new cate- 


cause they represent one of the few 
attempts in the li 


wood, 1952), 
Galanter and Gerstenhaber (1956) 
define thinking in a way that seems 
to differ sharply from the usual defi- 
nitions (although the reviewer does 
not really understand their Position), 
and Maltzman’s (1955) definition 
restricts thinking to articulate organ- 
isms. However, disagreement on 
definitions of either thinking or prob- 
lem solving is less than might be ex- 
pected; at least it was possible to 
hold a conference on human problem 
solving where some areas of agree- 
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ment were evident in the absence of 
a definition of the field (Hovland & 
Kendler, 1955). r 

Any further pursuing of the issue 
of a definition of problem solving 
would lead into discussion of proc- 
esses in problem solving behavior, or 
into theory. Both of these topics can 
be handled better after the bulk of 
the empirical studies has been pre- 
sented. 

Most of the remainder of the 
paper is a review of empirical studies 
of human problem solving. Insofar as 
possible, the review is organized in 
terms of the independent variables 
that influence problem solving per- 
formance. The categorization of 
these variables that was finally de- 
cided on is not very satisfactory, In 
many cases investigators used highly 
specific variables or conditions and 
failed to suggest any similarity be- 
tween their variables and those used 
by other investigators. Thus, the re- 
viewer’s categories are necessarily 
too arbitrary. 

Most of the studies to be reviewed 
Seemed to fall into one of three major 
classes. In the first, the independent 
variables were introduced prior to 
testing on the final problem solving 
task, which task was the same for, 
and was Presented under constant 
conditions to all Ss. These studies 
used what is essentially a training and 
transfer design. In the second group, 
the independent variables were intro- 
duced during work on the test prob- 
lem, or were changes in the problem 
itself. The third group contains 
ables were cer- 
tain characteristics of the Ss used. 


made to differ- 


; 
| 


HUMAN PROBLEM SOLVING 


processes, and contributions to the- 
ory. 


TRANSFER FOLLOWING VARIATIONS 
IN TRAINING 


Different Methods of Training 


Methods of “understanding.” The 
first four studies reviewed here are 
similar in that all dealt more or less 
with transfer following training by 
memorization vs. training by various 

understanding” methods. Hilgard, 
Irvine, and Whipple (1953), Hilgard, 
Edgren, and Irvine (1954), and Cran- 
nell (1956), all used Katona card 
tricks (Katona, 1940) as tasks; For- 
gus and Schwartz (1957) used various 
arrangements of letters. In all 
studies, different groups of Ss were 
first trained on problems solvable 
either by memorization (of, e.g., a 
certain order of cards or letters), or 
by learning, via one or more under- 
standing methods, a principle or 
technique presumably applicable to 
many such tasks. The differently- 
trained groups were then tested for 
recall of training problems, and for 
transfer to both simple and difficult 
new tasks. 

Hilgard et al. (both studies), and 
Crannell reported little or no differ- 
ence among methods on recall or on 
simple transfer tasks, but on more 
difficult transfer tasks certain under- 
Standing methods, particularly the 
“Katona diagram,” produced super- 
ior performance. Forgus and 
Schwartz found that both demonstra- 
tion and discovery of the principle 
led to better performance than did 
memorization on all three tests. 

All four of the studies tested the 
same Ss successively on all three 
types of tests, so the results may have 
been affected by differential transfer 
effects among tests, or by varying in- 
teractions between particular train- 
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ing methods and particular tests. 
Also, the various training methods 
were complex, unanalyzed variables 
that are difficult to evaluate. For ex- 
ample, a principle may be a single 
item, such asa formula, in which case 
it is easily learned by an understand- 
ing group; Forgus and Schwartz 
found that their memorization group, 
which had to learn a series of items, 
required about twice as much prac- 
tice in original training as did under- 
standing groups. In contrast, the 
Katona diagram used by Hilgard et 
al. and Crannell, was an understand- 
ing method that required much 
original training. Further, an under- 
standing method may yield either 
positive or negative transfer depend- 
ing on the particular test task; some- 
thing like this apparently occurred 
with the “working backwards” 
method used by Hilgard et al. (1954), 
and Crannell. 

Hilgard et al. pointed to limited 
understanding of even an under- 
standing method as a source of error. 
The same point was made by Burack 
and Moos (1956), who found little 
transfer to solution of a mechanical 
puzzle from either verbal or actual 
presentation of illustrations of centri- 
fugal force, and by Székely (1950a) 
with problems requiring use of hydro- 
static principles. 

The point raised above that results 
may depend on the interaction be- 
tween a particular training method 
and a particular test task is illus- 
trated in Corman’s (1957) study- 
Groups given varying amounts of in- 
formation on how to attack Katona 
match problems produced more solu- 
tions than groups given varying 
amounts of information about the 
principle underlying all problems, 
whereas the latter groups did best 
when tested for ability to verbalize 
the principle. The problem of inter- 
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preting results when Ss are tested 
successively on a series of tasks was 
also noted. Although the training 
variables appeared to have signifi- 
cant effects on training, and on simple 
and complex transfer tasks, effects on 
both types of transfer tasks disap- 
peared when number of training 
problems attempted and solved were 
partialled out. 

Székely (1950b) reported that Ss 
trained on the principle of moment of 
ineritia by a “modern” method (first 
predict and watch demonstration of 
movements of a torsion pendulum, 
then read textbook material on me- 
chanics) did better on the two- 
spheres problem, which requires ap- 
plication of the Principle, than did 
“traditional” method Ss (read text, 
then watch demonstration). But 
Maltzman, Eisman, and Brooks 
(1956) failed to duplicate this finding. 
Either method, or a combination of 
the methods, produced more solu- 
tions than a control group with no 
training, but there were no significant 
differences among the three experi- 
mental groups. 

Craig (1956) had Ss cross out the 
word that was unrelated to four other 
words; each such group of words 
utilized a different Principle. The Ss 
who were told the Principle applying 
to each block learned more during 
training than uninformed Ss, and, 
probably because of differential 
learning, retained more after 31 days, 
But on transfer to new items the 
groups did not differ, although both 
did better than they had on training 
items. 

In Buswell’s (1956) study of pat- 
terns of thinking, one experiment 
concerned the discovery of general- 
izations. The Ss were to discover a 
rule whereby they could get sums of 
ordered columns of numbers without 
simply adding. The Ss found the 
problem difficult, and had trouble 
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verbalizing the rule. On a test for 
transfer to similar problems, about 
half the Ss showed transfer. 

Other methods of training. Ray 
(1957) required Ss to state verbally 
what they were going to do before 
they were allowed to make motor re- 
sponses to a problem requiring turn- 
ing off a light with switches. This 
verbal work facilitated problem solu- 
tion, probably because, as was also 
shown, the verbal work increased S’s 
tendency to respond systematically 
to elements of the problem. 

A specific systematic approach, the 
half-split technique, was taught to Ss 
by Goldbeck, Bernstein, Hillix, and 
Marx (1957); the technique was to be 
applied in a complex lights-and- 
switches apparatus problem. The 
technique was not particularly effec- 
tive until Ss were first taught the 
deductive skill of locating the ele- 
ments of the problem to which a sys- 
tematic approach could be applied; 
then the technique, as a device to 
improve efficiency, was an aid. 

Kendler and Kendler (1956) re- 
Ported that 3 to 4-yr.-old children 
could make a correct inference when 
their training had included all of the 
Separate part-tasks needed to make 
the inference, It js possible, however, 
that the children used body orienta- 
tion cues, in Part, to make the cor- 
rect response. The Ss had to change 
their position with respect to the ap- 
Paratus in order to learn one (pre- 
sumably the crucial one) of the neces- 
sary part-tasks. Significant infer- 
ential behavior occurred only when 
this part-task was the last one 
earned, i.e., immediately before the 
test trial. 

The Preceding studies of various 
methods of training did not yield par- 
ticularly clear-cut results, However, 
the studies varied greatly from one 
another, and most dealt with rela- 
tively unanalyzed situations, Since 
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te. one learns probably has 
2 a ae ag and negative transfer- 
ae effects, depending on the situa- 
es P information is needed 
a hat specific responses are 
pin under a particular training 
ae , and what responses are re- 
ona particular transfer task. 
“dagen attention should be paid 
fae amount and breadth of train- 
ph nep a particular training method 
es a d positive transfer only if it 
vell learned in all its aspects. 


Amount of Training 


Pi in studies of set (see later), 
te Nie nn research has been directly 
para i with the effect of degree of 
oo ins learning of responses which 
ments xpected to influence problem 
fom rmance. This is surprising, since 
a. other learning situations both 
ae 6 Ps and negative transfer effects 
Éy A ask are considerably influenced 
o ariations in amount of practice 
E similar training task. 
ae weeks prior to the problem 
cane session Marks (1951) gave 
a of his Ss alecture which empha- 
ei analysis of a problem into its 
nee geese The lectured group Was 
arent early better than the nonlec- 
findi group on a problem requiring 
the ing errors in square roots, al- 
a ugh a finer method of scoring solu- 
5 ns produced data indicating some 
uperiority of the lectured group- 

ee (1954) gave one group 
i preliminary training on a simp- 
version of a problem requiring 
turning off lights with buttons. This 
training greatly improved perform- 
ance on the final problem. More im- 
Portantly, training interacted signifi- 
cantly with length-difficulty of the 
Nee With no prior training, the 
ie r e 4-item problem was much eas- 
i o solve and learn than the 6-, 8-, 
a 10-item problems, which were 
ustered. But after training, the 4-, 


6-, and 8-item problems were all 
solved about equally well, while 10 
items were still quite difficult. This 
shift in relative difficulty as a func- 
tion of prior training is an important 
finding that should be followed up. 
It also illustrates once again the 
point that results may depend upon 
interactions between particular train- 
ing methods and particular transfer 
tasks. 

Fattu and Mech (1953a) did one of 
the two experiments in which more 
than two amounts of training were 
employed. They compared groups 
given none, some, or much informa- 
tion about locating malfunctions in a 
gear train. Performance increased 
directly as a function of amount of 
training. Sato (1953) also compared 
groups given none, some, Or much 
prior training with the characteristics 
of visual stimuli which were arranged 
i tain ways to provide problems. 


Difficulty of the problems was also 


In general, differences in 
amount of training were significant 
for child Ss, but problem difficulty 
was more important for adult Ss. 
Although they performed no ex- 
eriments, Bloom and Broder’s (1950) 
work suggests that problem solving 
proficiency may be improved by gen- 
eral training that is not tied to par- 
ticular kinds of problems. A general 
approach to problems (essentially a 
checklist) was developed from com- 
parisons of the problem solving be- 
havior of high grading and failing 
college students. Training sessions 
with the checklist improved perform- 
ance of failing students on various €x- 
aminations, although control groups 


were not employed. 


Broder’s laboriously d 
list deserves further study; there were 


hints in their work that training with 
the checklist might transfer post 
tively to 2 wide variety of problems. 

None of these studies of amount 0! 
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training varied some reasonably uni- 
dimensional method of training over 
a wide range. Even in the Fattu and 
Mech and Sato experiments, much 

training involved a qualitative as 
well as a quantitative change over 
“some” training. Only in research on 
set can one find a study relating de- 
gree of original learning, systemati- 
cally varied, to amount of transfer. 


Set 


Some situations are problems for 
adult Ss not because of deficiencies in 
S’s intelligence, motivation, or past 
experience, but because S is set to 
respond in certain ways. These sets, 
or momentarily dominant response 
tendencies, can have Powerful effects 
in problem solving. Some tasks raise 
problems for human adults only be- 
cause of wrong sets; under other sets 
there is no problem, Perhaps because 
of this, much of the literature on set 
concerns negatively transferring sets, 

Simple sets. Nearly all studies of 
what will here be calle 
have used either water 
or anagrams, 
these studies have been reviewed by 
Chown (1959), only studies not coy- 
ered in her Paper will be cited. 

The standar 


jars (see Chown) may induce large 


ported that 83% of experimental Ss 
(those given training Problems) made 
set responses on the transfer prob- 
lems, whereas only 0.6% of control Ss 
(no training problems) showed set, 
However, the amount of set with 
either water jars or anagrams is influ- 
enced by a number of variables, Set 
was increased by increases in the 
number of training problems (Mayz- 
ner, 1955; van de Geer, 1957), by 
speed instructions (van de Geer), by 
similarity between training and test 
anagrams (Maltzman, Eisman, 
Brooks, & Smith, 1956), and by un- 
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solvable training problems in some 
cases (van der Geer). Studies em- 
ploying other variables that have in- 
creased set are cited by Chown. 

Most of the above findings are 
clear-cut, but some qualifications 
should be noted. Van de Geer’s 
(1957) results were chiefly in the form 
of interactions among his six varia- 
bles. Thus, he found that unsolvable 
training problems increased set only 
in boys and only if extinction prob- 
lems were not given prior to transfer 
problems. Also, increasing the num- 
ber of training problems increased set 
most clearly when extinction prob- 
lems were not given. Rhine (1957) 
found that appropriate set (similar 
training and test anagrams) facili- 
tated test performance only when 
training anagrams were difficult and 
Ss had experienced some failure. 
With easy anagrams and success ex- 
periences, there was no difference be- 
tween groups trained under appropri- 
ate or inappropriate set, 

Set was decreased by extinction 
problems given prior to test problems 
(van de Geer), by increasing the num- 
ber of water jars (in training prob- 
lems, test Problems, or both, Bene- 
detti, 1956), and by interpolating 
problems having different solutions 
among the training problems (Mayz- 
ner, 1955; Mayzner & Tresselt, 1956). 
Since distributed Practice has been 
found to reduce set (Chown), the 
Mayzner, and Mayzner and Tresselt 
experiments are confounded because 
interpolating problems during train- 
ing necessarily distributes practice on 
training problems. 
he set studies demonstrate spe- 
cific positive or negative transfer 
rom one response Pattern to another, 
the direction of transfer depending on 
the Particular relation between train- 
ing and transfer task. However, it 
would be expected that a series of 
training problems would also produce 
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a nonspecific positively transferring 
effect: learning how to learn (Dun- 
can, 1958; Harlow, 1949), or perhaps 
learning how to solve. Nonspecific 
transfer was demonstrated by Good- 
now and Pettigrew (1956). Groups 
were first trained to respond to spe- 
cific stimulus patterns in a two-choice 
situation, next were given random 
stimulus presentations (presumably 
to extinguish responding to pat- 
terns), and finally were tested for 


. learning of new patterns. Such 


groups tended to learn new patterns 
more rapidly than Ss with no prior 
training. The authors believe that Ss 
without prior training have trouble 
because they pay too much attention 
to their own response patterns rather 
than to the stimulus patterns, and 
that this tendency may be a source of 
difficulty in a variety of problems. In 
any case, their results suggest the 
possibly powerful effects of nonspe- 
cific transfer, learning to think or 
learning to solve, in all kinds of prob- 
lems, effects which have been recog- 
nized by only a few writers (Harlow, 
1949; Underwood, 1952; Weaver & 
Madden, 1949). All water jar and 
anagram studies probably included 
some effects of learning to solve, in 
addition to specific positively or 
negatively transferring habits and 
sets. 

With a different type of problem, 
but one which involved set in some 
sense, Lawson, Hillix, and Marx 
(1955); and Hillix, Lawson, and Marx 
(1956) found no effect on transfer 
problems of number of reinforce- 
ments during training, and little ef- 
fect of similarity between training 
and transfer tasks. However, the 
problems (guessing circuits in a 
matrix of lights) differed widely from 
those usually used in set studies, and 
their Ss may have been able to dis- 
criminate fairly well between train- 
ing and transfer tasks. In the usual 


set study, S has no way of knowing 
which is the first test problem (at 
least until he has solved it), a fact 
which probably tends to increase set. 

Very few investigators have used 
anything other than water jars or 
anagrams to study simple set, so 
practically all information comes 
from two rather similar types of prob- 
lems. Other problems are needed, as 
well as methodological work on water 
jars and anagrams. Frick and Guil- 
ford (1957) do not think that water 
jars induce a set of any considerable 
strength, and agree with Levitt 
(1956) that the problems are not a 
good “psychometric for experimental 
instrument. No thorough methodo- 
logical | study of anagrams was 
found, although Wiggins (1956), in 
one part of his study, revealed a 
source of uncontrolled variation in 
anagrams with two solutions. He 
scaled such anagrams in terms of the 
frequency with which one or the 
other solution was given by naive Ss 
and found variation over the entire 
range (.50 to 99 probability of occur- 
rence of one of the solutions). Wig- 
gins went on to show that training, in 
the form of brief study of the list of 
words which were the infrequent 
solutions, produced changes from 
giving the frequent to giving the 
infrequent solution. Anagrams in 
which neither solution was especially 
predominant originally were more 
subject to change. This experiment 
suggests that in studies of set, use of 
double-solution test anagrams which 
have, initially, equally likely solu- 
tions would produce a worthwhile re- 
duction in variability. 

It is unfortunate that investigators 
of set almost never presented learning 
curves for either training or test prob- 
Jems. Analysis by stage of practice 
can reveal important information. 
For example, instructions to induce 
appropriate set may produce better 
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performance on early, but not on 
late, training problems because set 
(or habit strength) can also be de- 
veloped by solving a series of prob- 
lems of the same class. Learning 
curves for transfer problems would 
reveal the locus as well as the per- 
sistence of transfer „effects, eg., 
groups with inappropriate set might 
show negative transfer on early test 
problems but not on later test prob- 
lems because of learning how to solve. 
Van de Geer (1957) found that train- 
ing conditions had different effects at 
different stages of transfer practice, 

Although there are other factors 

that affect set (e.g., subject variables, 
see later), the papers already re- 
viewed reveal that quite a lot is 
known about the functional relation- 
ships between a number of independ- 
ent variables and simple problem 
solving sets. At the same time, most 
of the information comes from water 
jars and anagrams, tasks that are 
sometimes held to exemplify only re- 

Productive, not productive, thinking 

(Maltzman, 1955). Even if one does 

not (as the reviewer does not) hold 

to this distinction between different 

kinds of thinking or problem solving, 
there is no question that much more 
needs to be known about set in more 
complex problems. Certain difficult 
“insight” tasks, such as the pendu- 
lum solution of the two string prob- 
lem, appear to be problems only be- 
cause the situation evokes strong, 
though labile, response tendencies 
that do not lead to solution. Some 
information about sets in more com- 
plex problems is developed in the sey- 
eral types of experiments reviewed jn 
the next section. ‘ 

Complex sets: functional fixedness 
and preavailability. All of these 
studies may be described as attempts 
to produce positive or negative trans- 
fer to a problem by procedures in- 
tended to change the order of dom- 


inance either of responses in a hier- 
archy, or of whole hierarchies. 
Duncker’s (1945) work introduced 
a type of complex set called func- 
tional fixedness, which may be de- 
fined as inhibition of use of an object 
in one function due to recent prior 
experience with the object's serving a 
different function. Chown reviews 
most of the functional fixedness stud- 
ies that have appeared since Dunck- 
er’s work. In a more recent study, 
van de Geer predicted that if an ob- 
ject were first used in an unusual 
function, no functional fixedness 
would be found when the object sub- 
sequently had to be used in a usual 
function (the typical order in func- 
tional fixedness studies is usual func- 
tion first, unusual function second). 
A screwdriver and a wrench were 
available to loosen a screwhead bolt, 
or to serve as pendulum weight in 
the two-string problem, Although 
the number of Ss was small, the re- 
sults appeared to confirm the predic- 
tion. The group that solved the prob- 
lem last tended to avoid the object 
used just previously to loosen the 
bolt, i.e., showed functional fixed- 
ness. But the group that solved the 
Problem first did not tend to avoid, 
in loosening the bolt, the object that 
had been used as a weight. 
Functional fixedness is a complex 
set with negative transfer effects. 
What are here called preavailability 
studies are attempts to induce com- 
plex sets with positive transfer ef- 
fects. Saugstad (1955) presented, one 
at a time, the various objects neces- 
Sary to solve the Maier candle prob- 
lem and had S list all possible func- 
tions for each object; this was called 
an “availability” test. On the test, 
13 out of 57 Ss gave evidence that the 
necessary functions were available, 
l.e., listed functions that would later 
be necessary to solve the problem. 
All of these 13 Ss later solved the 
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oe whereas Saugstad reported 
eer! 58% of those who did not 
ecole the necessary functions 
e available solved the problem. 
t ough the experiment is not im- 
| ri statistically, it does suggest 
b a Sage solving was influenced 
eS (set, domin- 
cant o Pn responses in the hier- 
hero of responses associated with 
a : component of the problem. 
ae (1957) had Ss list uses fora 
cpr we and other objects, then 
A e two-string problem with the 
A ane as the only object heavy 
Onl e to serve as pendulum weight. 
nd of the 61 Ss initially indicated 
si d a screwdriver as some sort 0 
a ght, Whereas 55 Ss eventually 
ved the problem. Although this 
ee of the experiment was incon- 
pti Staats did find low but sig- 
a correlations between time to 
a and frequency and latency of 
iy v responses given in a postsolu- 
a isting of screwdriver uses. He 
Po that these correlations be- 
en verbal (listed uses) and instru- 
aut (problem solving) response 
oe eto indicate that problem S0- 
; ion would have been facilitated 1 
weight responses had been elicited 
in sufficient numbers prior to solu- 
tion. 
na different method of manipulat- 
g preavailability was use by Jud- 
son, Cofer, and Gelfand (1956). 
Their Ss first learned several 5-Wor 
lists, among which i 
words, in various P 


various contexts, relevant to solution 
of the  later-presente roblem, 
an pendulum 


evant to the two 
ceiling, an 
he Maier hat- 


that learned a 
three key words was better than 


other experimental 
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groups at producing pendulum solu- 
tions to the string problem, or floor- 
to-ceiling solutions of the hatrack 
problem. Not all of the many dif- 
ferences (there were two replications 
of the string problem experiment) 
were statistically significant, and the 
findings were limited to men. Women 
produced few solutions of the desired 
type to either problem. 
Judson et al. also reported an ex- 
eriment showing that reinforcement 
of one response, taken from a previ- 
ously elicited chain of free associa- 
tions, significantly increased the 
probability of occurrence of other 
words in the same chain. Brief men- 
tion was also made of two attempts to 
facilitate solution of the string prob- 
lem by prior elicitation of free associ- 
ations to a list of words, one of which 
In the first experiment, 


was rope. 
those who had given “swinging” 
associations to rope produced sig- 


nificantly more pendulum solutions 
ho had not, but these 
ot confirmed in the 
y the general 
] their experiments tended 
ion that set and 

problem solving can be 
interpreted in terms of response hier- 
archies which are influenced by char- 
acteristics of the problem and by re- 
inforcement. 
In an oft- 
(1930) claimed to have demo 
that relevant past experience is not 
always sufficient to 50 
The Ss must also have 
sort of set Or 
past experience an p 
to enable them to 
ast experience to i 
Madden (1949) and Saugsta 
repeated the ess 


cited experiment, Maier 
nstrated 


n 
that supposedly 
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serves as direction. Neither found 
that addition of direction increased 
number of solutions. Saugstad also 
experimented with the three part- 
tasks Maier had used to provide rele- 
vant past experience for the test task 
(two-pendulum problem). Solutions 
of the two pendulum problem in- 
creased directly from demonstration 
of the part-tasks (Maier’s procedure), 
to solving them as problems them- 
selves, to solving them when one of 
the three was presented in an im- 
proved version. Saugstad held that 
“availability of functions” was all 
that was necessary to solve the prob- 
lem. 
Weaver and Madden pointed out 
that Maier ignored nonspecific past 


adults; adult Ss “know” 
responses, but do Not ha 
rect set. 

One other study Might be classi- 
fied under Preavailability, Kolers 
(1957) used problems requiring ab. 
straction among forms Presented on 
a screen. A cue form that would aid 
solution of the problem was flashed 
subliminally just before Presentation 
of the problem. The results were un- 
clear in the first experiment, but in 
a second experiment there was some 
evidence that the subliminal cue 
aided problem solving. 

A possible reason why the preavail- 
ability studies did not yield clear-cut 
results is that the various situations 


were usually needlessly complex, 
even cluttered. (To some extent this 
was also true of the functional fixed- 
ness studies.) For example, S was 
asked to list uses for several irrele- 
vant objects as well as the crucial ob- 
ject, or was asked to solve the prob- 
lem in the Presence of several irrele- 
vant objects. No necessary purpose 
Seems to be served by these or other 
Ways of complicating the situation. 

oreover, such complex situations 
Probably generate a Potpourri of 


difficult to analyze and which may 
Judson et al. 
be implying this 
criticism of Overcomplexity when 
they suggested that their experiments 


$ 1956; 
Maltzman, Eisman, & Brooks, 1956 ; 
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Székely, 1950b; van de Geer, 1957). 

Stolurow, Hodgson, and Silva 
(1956) found negative transfer in air- 
plane mechanics from both orders of 
presentation of school training and 
brief job experience. Herman and 
Engstrand (1957) devised two classes 
of problems, one depending on posi- 
tion of letters on cards, the other de- 
pending on relationships in the alpha- 
bet. The results showed: positive 
transfer between problems of the 
Same class, zero transfer from posi- 
tion to alphabet problems, negative 
transfer from alphabet to position 
problems. 

: Swartz (1955) did not find any ef- 
ect on solution of a problem devised 
from playing cards by prior sorting 
of the cards into suits. 

; It was pointed out earlier that dif- 

erential, and unknown, transfer ef- 
fects may have been operating in a 
number of problem solving experi- 
ments. The research on order of 
presentation suggests that in design- 
ing studies of problem solving, one 
should not ignore the possibility that 
the experimental design may permit, 
even reinforce, differential transfer 
effects. 

j In over-all view, this major sec- 
tion on training and transfer in prob- 
lem solving appears as follows. In the 
case of problems which depend on 
simple sets, the effective training 
variables were largely the same vari- 
ables, operating in much the same 
way, that influence transfer per- 
formance in other learning situations. 
No such summary statement can be 
made about the antecedent variables 
for any other types of problems, al- 
though a few kinds of complex sets 
had some effect. In part, research on 
complex problems has yielded con- 
flicting results; more importantly, 
too little research has beem done. 
Furthermore, experiments on com- 
plex problem solving are mostly of the 


simple two-group type; studies in 
which even one variable was system- 
atically manipulated over a wide 
range are almost nonexistent. Sys- 
tematic variation cannot, of course, 
be undertaken until variables are 
identifed and dimensionalized, but 
little analytic work of this kind has 
been done in research on complex 
problem solving. 


VARIATION WITHIN THE PROBLEM 


In this group of studies, either con- 
ditions concurrent with the problem, 
or characteristics of the problem it- 
self, were varied. The experiments 
are extremely heterogeneous. Be- 
cause of this, no good defense can be 
offered for the subcategories used. 


Methods of Presenting the Problem 


This category includes studies in 
which the same problem was pre- 
sented in different modes or appeat- 
ances. The different modes were usu- 
ally, but not always, isomorphic to 
each other in the sense that relation- 
ships among the elements of the 
problem remained the same. 

Concreteness. Many problems can 
be presented in either symbolic or 
concrete (real) form, in various de- 
grees of these extremes, in miniature 
scale models of the real presentation, 
etc. Also, degree of overtness of S's 
behavior, insofar as it is under the 
investigator's control, has been use 
as a method of varying concreteness. 


The following studies found no ef- 
of the 


problem, o 
creteness of S’s be! 


(1957) with a miniatu! 
vs. the real presentation of the two 


pendulum problem; or Lorge, Tuck- 
man, Aikman, Spiegel, and Moss 
(1955a, 1955b), who used the mined 
roblem at seven “Jevels 0 
erbal, photographic, mini- 
le model, or real presenta- 
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tion, or various amounts of manipula- 
tion of the scale and real versions). 
In Saugstad’s repetition of the Maier 
experiment, the “direction” was a 
hint that was supposed to call atten- 
tion to the ceiling. Saugstad thought 
that a miniature scale model, of the 
actual hallway in which the two- 
pendulum problem usually must be 
constructed, would call more atten- 
tion to the ceiling, but neither num- 
ber of solutions nor behavior of fail- 
ing Ss gave any indication that the 
ceiling was a special source of diffi- 
culty. 

In contrast to the preceding studies, 
Cobb and Brenneise (1952) and Gibb 
(1956), found rather clear-cut ef- 
fects by varying concreteness. Cobb 
and Brenneise reported that anchor, 
reach, and extension solutions of the 
two-string problem decreased as con- 
creteness decreased over four steps. 
Pendulum solutions were little af- 
fected but were few enough so that 
for all types of solutions combined, 
percentage solutions were perfectly 
correlated with increasing concrete- 
ness. Gibb used three types of sub- 
traction problems Presented in three 
degrees of concreteness to second 
grade children. Both main variables 
were significant on most measures, 
and did not interact. If children are 
more affected by concreteness than 
are adults, Gibb’s results would not 
necessarily conflict with the studies 
reporting no effects of concreteness, 
But there is no obvious way of ac- 
counting for Cobb and Brenneise’s 

positive results. They did use what 
is probably more of an _ “insight” 
problem than did the studies report- 
ing negative results, but it was only 
the insight (pendulum) solution that 
was not affected by concreteness. 
Their least concrete mode of presen- 
tation seems qualitatively different 
from the other three modes, but this 
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would not account for all their re- 
sults. 

Distribution of work and rest. Pe- 
riods of work and rest on a problem 
can be varied in a number of ways. 
Riley (1952) found no clear cut dif- 
ference between intertrial rests of 8 
sec. vs. 2 min. during learning of a 
rote list that required S to discover, 
to varying degrees, the response term 
for each stimulus. He noted that if 
anything, his results were the re- 
verse of the hypothesis that massing 
of practice should produce better 
performance early in learning be- 
cause it should facilitate discovery, 
whereas distribution should be better 
later in learning because it should 
facilitate fixation (Underwood, 1949, 
reviews the older studies from which 
this hypothesis was developed). 

Distribution of practice had clear- 
cut effects in Shaklee and Jones’ 
(1953) experiment when work and 
rest cycles were varied prior to solu- 
tion of a kind of matching-by-infer- 
ence problem. Groups worked under 
continuous practice, under cycles of 1 
min. work-30 sec, rest, or cycles of 1 
min. work-4 min. rest. In a second 
experiment the Jatter cycle was 
changed to 1 min,-99 sec. cycles. In 
both experiments, the first and third 
cycles, i.e., continuous practice and 
the quite distributed cycle, did not 
differ in terms of percentage of solu- 
tions, but both Were significantly su- 
Perior of the 1 min,-3 sec, cycle. This 
U-shaped function between correct 
solutions and distribution did not oc- 
cur with incorrect solutions, which in- 
creased directly with distribution of 
Practice, 

It is rather clear that distribution 


of practice in problem solving needs 
urther study. 


; Other methods. 
ing studies used 
presentation, 


Each of the follow- 
a unique method of 
Katz (1949, experi- 


a 


kiad 


Tp 
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ment more briefly reported in 1950) 
had adult Ss give sums based on the 
numbers 1-9; with children, the 
numbers were 1-5. Each number 
was printed on a card. The cards 
were presented in what might be 
called ‘degrees of disorder,” e.g. 
cards were presented in order in a 
column, in an unordered column, 
after being shaken in a box, etc. Time 
to give sums, and errors, increased 
directly with increasing disorder of 
presentation, both in children and in 
adults. 

The calculus of propositions tasks 
(see Moore & Anderson, 1954b) were 
presented by Anderson (1957) as if 
they had from 1-4 goals or solutions, 
when in fact there was only one goal. 
The number of Ss achieving the goal 
decreased directly as number of 
stated goals increased. This result 
may be roughly similar to one which 
apparently occurs with the two-string 
problem, Instructions to find as 
many solutions as possible, vs. in- 
sistence on the pendulum solution 
only, seem to elicit anchor, reach, and 
extension, at the expense of pendu- 
lum, solutions. 

Two other studies found no effects 
of different methods of presenting the 
problem. Fattu and Mech (1953b) 
reported no differences attributable 
to interrupting Ss at various stages 
of work on gear train problems and 
asking them to state verbally where 
the malfunction was. Hafner (1957) 
found no effect in fourth grade chil- 
dren of instructions to verbalize 
while working on 4 stencil design 
problem. 

In general, degree of c 
has had little effect on problem per- 
formance in adults, excePt in Co 
and Brenneise’s (1952) experiment. 
Studies using other methods of vary- 
ing presentation are too 
dissimilar to summarize. 


oncreteness 
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Variation among Elements of the 
Problem ; 

These studies are also, in a way, 
methods of varying presentation of 
the problem, but in this case there 
was usually a real change in the prob- 
lem itself, e.g., 2 change in the num- 
ber, order, or kind of problem ele- 
ments. Perhaps the Katz (1949) and 
Anderson (1957) experiments could 
just as well have been included here, 
as well as some experiments on 
simple sets. In the latter, it has been 
found that interpolation of various 
conditions among the test problems 
will reduce set, €-8-, extinction prob- 
lems (van de Geer), additional jars 
(Benedetti). 

In Judson and Cofer's (1956) ex- 
periment Ss had to select the word 
that was out of place in groups © 
words, each group containing two 
ambiguous and two unambiguous 
words. The Ss clearly chose on the 
basis of the first-appearing unambig- 
uous word; in the authors’ terms, 
“priority of activation of a response 
hierarchy” significantly influenced 
behavior. Increasing the number of 
ambiguous words between the two 
unambiguous words increased the 
dominance of the first-occurring un- 
ambiguous word. 

Surprisingly strong effects of spa- 
tial contiguity among elements of a 

roblem were reported by Kay 
(1954). The Ss had to turn off a row 
of lights three feet away from a row 
of switches, using as 4 cue numbers 
printed in a random arrangement on 
a card. When a light came on, Sas- 
signed it a number from 1 to 12 (left 


to right), located the number on the 
card, and pressed the switch in line 
with the number. Time and error 


scores increased directly as the card 
was first placed directly in front of 
the switches, then moved to midway 
between, finally placed directly 10 
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front of the lights. A few Ss, espe- 
cially older ones, could not do the 
task at all if the card was anywhere 
beyond the midpoint, i.e., closer to 
the lights than to the switches. The 
effects of contiguity might not have 
been so great if Ss had performed the 
most difficult task (card directly in 
front of lights) first, then transferred 
to the easier tasks. Nevertheless, the 
results clearly suggest that intraprob- 
lem contiguity is of fundamental im- 
portance in problem solving. It is 
possible that contiguity among the 
elements of a concrete problem heavy- 
ily determines the degree to which 
such processes as reordering and re- 
structuring (Wertheimer, 1945) can 
occur, 

Solley (1957) made use of the fact 
that the meanings of small, white, 
light, and up tend to be Positively 
correlated, and opposite to large, 
black, heavy, and down. Different 
sets of discs (boxes), each incorporat- 
ing one of these dimensions, were 
used in the disc transfer problem. In 
six trials through the problem there 
were fewer errors when boxes had to 
be moved in the normal light-to- 
heavy direction than in its opposite, 
The ordinary size-cue expectancy, 
small-to-large, produced fewer extra 
moves and shorter time than its op- 


posite. Other comparisons were not 
significant. 


In Cobb and Brenne 
ment, anchor, Teach, a 
solutions of the two-st: 
decreased, pendulum solutions jn- 
creased, when the investigators 
changed the group of objects ordi- 
narily available to an alternative 
group that was more relevant to 
pendulum solutions. 

Studies of behavioral processes in 
problem solving (see later) sometimes 
also report changes in performance 
due to variation among problem ele- 
ments. Battig (1957) had Ss guess 


ise’s experi- 
nd extension 
Ting problem 


the letters of a word with foreknowl- 
edge only of the number of letters in 
the word. The particular words used 
were a major source of variance; both 
length and frequency of usage of the 
words were complexly related to the 
several response measures. Hunter 
(1957) used different ways of stating 
problems of the type: A is greater 
than B, C is greater than B, which is 
greatest. There were differences due 
to the ways of stating the problems, 
to atmosphere effects, and to type of 
relation used (happier-sadder, taller- 
shorter, etc.). Hunter’s study has 
some similarity to earlier research 
(not reviewed here) on atmosphere 
and order effects in syllogistic rea- 
soning, 

In contrast to the experiments on 
methods of problem presentation, 
studies of variation among problem 
elements consistently reported at 
least some Significant effects, occa- 
sionally powerful effects, on problem 
solving performance. Thus, per- 
ormance on a problem may or may 
not be influenced by contextual vari- 
ables, such as methods of presenta- 
tion that do not change relationships 
among elements of a problem. But 
changes of a problem's internal struc- 
ture usually influence performance, 
even in cases where the problem re- 


mains, in some physical sense, the 
same, 


Difficulty 


All variables that significantly af- 
fect speed or frequency of solution 
could be said to influence the diffi- 
culty of a problem. The studies to be 
reviewed here are those in which 
Some condition intended to influence 
difficulty was deliberately varied. 

ll experiments on methods of 
“understanding” (Corman, 1957; 
Crannell, 1956; Forgus & Schwartz, 
1957; Hilgard et al., 1953; Hilgard et 
al., 1954) used several transfer tasks, 


t. 
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eo called simple, others difficult. 
n some cases, but not always, it ap- 
peared that different training meth- 
ods produced differences only on dif- 
ficult transfer problems. 
: Performance is rather clearly af- 
ected by deliberate increases in 
problem difficulty. Within limits, 
problem difficulty is increased by 
increasing: the number of stimulus 
ee with number of response items 
eld constant (Brush, 1956), the 
number of stimulus-response Or total 
alge (Brush, 1956; French, 1954), or 

e response availability, defined as 
the number of response items from 
which the correct response for each 
stimulus must be selected (Brush, 
1956; Noble, 1955; Noble, 1957; 
Riley, 1952). These studies make an 
important contribution to knowledge 
of S-R relationships in problem solv- 
Ing; in particular, the response avail- 
ability experiments represent direct 
attacks on the important dimension 
of response discovery. The most ex- 
tensive work on response availability 
is that by Noble. He showed that 
with four stimuli, difficulty increased 
directly as number of available re- 
sponses per stimulus increased from 
4 to 10, but that there was relatively 
little further increase in difficulty 
from 10 to 14 alternatives. 

Ling (1946) and John (1957) give 
detailed protocols of changes in be- 
havioral processes that occur when 
problem difficulty is increased. Ling 
used Köhler-type tool problems of 
increasing difficulty with young chil- 
dren. John developed a complex de- 
vice called the PSI (Problem-Solving 
and Information) Apparatus (see 
also John & Miller, 1957) which was 
used with two levels of difficulty with 
adults, In the Goldbeck et al. (1957) 
study of the half-split technique, use 
of different levels of difficulty on 
their apparatus revealed that the 
technique was of no value on the 


more difficult problems until Ss were 
first given training on deductive 
skills. 

As might be expected, performance 
usually varied as a function of prob- 
lem difficulty. Noble’s work shows 
that the function is not necessarily 
linear. 


Hints and Aids 


Various hints, aids, or instructions, 
given S just before or during work on 
a problem, have been used to facili- 
tate solution. Maltzman, Eisman, 
Brooks, and Smith (1956) found that 
instructions influenced solution of 
test anagrams regardless of the class 
of training anagrams or the type of 
instructions that S had been given 
for training anagrams. In one of 
Burack and Moos’ (1956) experi- 
ments, three increasingly-concrete 

iple of cen- 


hints concerning the princip: 
n one at a 


trifugal force were give! 
intervals while S 
worked on the mechanical puzzle. 

iven, five 
of the eight Ss had managed to solve 


the problem. 


Experimen 
ous kinds were of more primary con- 


cern have been reported b 
(1951) and Marks (1951). 

study was based on Duncker’s (1945) 
notion of “explication of the goal.” 
Several experiments were done on 
two problems: make triangles out of 
matches, and fit together pieces © 
wood to form a tetrahedron. Experi- 
mental groups received hints at regu- 


hile working, eac suc- 
al increas- 


t intended to 


h roblem; eventually, 
i ental Ss solved 


cantly more experim 
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in all experiments and on both prob- 
lems. i 

Marks’ Ss tried to locate errors in 
square root problems. Two kinds of 
hints were used, a list of possible 
sources of error, or E’s urging S, at 
intervals, to ask himself where a 
mistake could occur. As Marks pre- 
dicted, verbal urging increased both 
S's vocalizations (naming or pointing 
to problem elements), and the num- 
ber of solutions, but contrary to pre- 
diction, the list had no effect on 
number of solutions. Verbal urging 
yielded tetrachoric correlations of 

.94 with vocalizations, .82 with solu- 
tions. 

All of the studies on aids found that 
at least some kind of aid was effec- 
tive, sometimes very effective. It is 
curious, then, to find an occasional 
study reporting that Ss were aided if 
necessary, but not reporting how 
many Ss were aided or if aid had any 
effect, 

In summary of this major section 
on variation of conditions during the 
solving of a problem, it may be noted 
that almost all variables studied have 
influenced Performance. The major 
exception is the class of diverse pro- 
cedures called methods of problem 
presentation which, except perhaps 
for concreteness, yielded either con- 
flicting results or too few results with 
any one method to warrant a conclu- 
sion. 

The studies reviewed in this sec- 
tion illustrate a weakness that runs 
through the whole area of problem 
solving research, viz., the heterogene- 
ity of problems and techniques em- 
ployed. About 100 empirical Studies, 
several of them including more than 
one experiment, are covered in this 
review. In nearly half of these studies 
the problem used was devised by the 
authors and has not yet been used by 
anyone else; even a brief description 
of each of these problems would have 
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added materially to the length of this 
Paper. This diversity is a major rea- 
son why the area of problem solving 
seems so chaotic, and is a serious 
obstacle to systematic progress. A 
few authors stated the advantages 
that their new problems were pre- 
sumed to have, occasionally in sep- 
arate publications (Marx, Goldbeck, 
& Bernstein, 1956; Moore & Ander- 
son, 1954b), but most did not. Prob- 
lem solving research would be im- 
Proved if more efforts were made to 
meet the standards for problems set 
by Ray (1955). 


SUBJECT VARIABLES 


Although a few experiments on 
problem solving have been expressly 
designed to test for effects of various 
characteristics of human Ss, most 
Papers report effects of such variables 
as by-products, Therefore, most of 
the studies to be reviewed here have 
been cited earlier and will be only 
briefly described, 


Sex Differences 


Not infrequently, men have been 
found to be better problem solvers 
than women, but close examination 
of the literature reveals some qualifi- 
cations of this finding, 

Van de Geer (1957) reported two 
experiments showing that 12-yr.-old 
girls were both more susceptible to 
Set and less able to surmount set than 
were boys of the same age. Van de 

eer also showed that girls developed 
no more set from two training prob- 
lems than did boys, but that with six 
training problems girls developed so 
much set that unsolvable problems 
or speed instructions did not further 
increase their set. Rhine (1957) re- 
Ported no sex difference in set on test 
anagrams, 

Men produced more pendulum so- 
lutions of the two-string problem 
(Cobb & Brenneise, 1952; Judson, 


ou 


el 
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& Gelfand, 1956; Staats, 
57), but not more anchor, reach, or 
extension solutions (Cobb & Bren- 
nelse, Staats suggested that women 
nay have had more trouble with the 
Peram solution because he found 
the they gave significantly more 
ammer” uses for the available 
Pendulum weight (screwdriver) than 
nen. In Cobb and Brenneise’s ex- 
oo men produced more total 
se ae than women under concrete 
i ods of presenting the problem, 
E under more abstract methods. 
aT Cofer, and Gelfand also 
= that women produced fewer 
ee. o-ceiling solutions of the hat- 
seat than men, and were 
a i erentially affected by the vari- 
preavailability conditions. 

In studies with other complex 
ids, sex differences have been 
al ae occasionally. Hilgard et 
ane 54) found high school boys su- 
ined to girls,on Katona card prob- 
se Saugstad (1952) used five 

omplex problems to test his hy- 
Porp that incidental memory 
it ould correlate negatively with abil- 
Sh to solve such difficult problems. 
f gnificant negative correlations were 
ound for boys but not for girls. 
oe (1955) found men sig- 
icantly superior on a battery © 
easoning items selected from some 
oi Guilford’s tests, but Staats (1957) 
ona no sex difference on the A9- 
tact Reasoning Test of the Differ- 
ential Aptitudes battery- 
(1954) tested school children at sev- 
eral age levels on arithmetic reason- 
ing; boys were slightly superior to 
girls, but significantly so at only one 
age level. Engelhard (1955) also 
ound few sex differences when boys 
and girls were tested on 4 variety of 
Instruments dealing mostly with 
arithmetic problem solving. 
: Perhaps the best study on sex dif- 
erences is that by Milton (1957). 


She used 20 brief problems, 10 of 
which were said to require restructur- 
ing or altering initial set, 10 of which 
involved straightforward solution. 
Men were significantly superior on all 
measures. However, score on the 
Terman-Miles M-F Scale, and the 
combined score from two other M-F 
scales, both correlated significantly 
with problem solving score. When 
M-F scores were partialled out ina 
covariance analysis, sex was not a 
significant variable on problem score. 
Furthermore, Terman-Miles scores 
contributed significant beta weights 
to problem scores within each sex. 
Thus, Milton argues that sex-role 
identification, learning of which be- 
gins in childhood, is an important 
variable in problem solving skills. 
This study, and Staats’ (1957) work 
on sex differences in response hier- 
archies, are important contributions 
to the issue of sex differences in prob- 
lem solving; both suggest that such 
differences as do occur result from dif- 
ferential past experience. 


Age Differences 

Age is usually an effective variable 
jn most types of problem solving. 
Some of the age studies have been re- 
viewed by Chown (1959). In other 
studies, Sato (1953) found that chil- 
dren were more affected by amount of 
training than by difficulty level of 
the problems, whereas the reverse 
was true for adults. Katz (1949) 
found both children and adults to be 
hindered by his various methods of 
presenting the problem, but differen- 
tial effects of age could not be meas- 
ured since the children had been 


i sier problems. H 
given easi@! FM 6-yr-olds did better 


i istic-like 

than {1-yr.-olds on his syllogistic 
roblems. Moraes (1954) gives de- 
ferns of think- 


ool children 


414 CARL P. DUNCAN 


arithmetic reasoning problems. 

In this group of studies, Ss have 
been differentiated mainly on the 
basis of chronological age. No system- 
atic attempts have been made to re- 
late differences associated with chron- 
ological age to other variables, e.g., 
mental age (but see John, 1957). 


Reasoning Ability 


Scores on the Abstract Reasoning 
Test of the Differential Aptitudes 
battery were compared with several 
other performance measures by Staats 
(1957). The test correlated —.33 
(P <.01) with log time to solve the 
two-string problem, but showed near- 
zero correlations with various pre- 
availability scores derived from listed 
uses for the screwdriver. In Maltz- 
man, Eisman, and Brooks’ (1956) 
study with the two-spheres problem, 
Ss above the median on the Abstract 
Reasoning Test produced more solu- 
tions than did Ss below the median. 
_ The most extensive study relating 
reasoning ability to other measures of 
problem solving was done by McNe- 
mar (1955), A battery of four types of 
reasoning items, not correlated highly 
with intelligence, was used to select 
a group of high and a group of low 
reasoners from a large group of 
students. These groups were com- 
pared on free and controlled associa- 
tion tests (fluency), on induction and 
deduction problems (ability to bring 
past experience to bear), and on 
water jar problems (variability), 
Highs and lows were, as predicted, 
not different on free association, but 

highs produced increasingly more 
words as association became increas- 
ingly controlled. Highs were better 
in accuracy and speed on the induc- 
tion problem, but better only in ac- 
curacy on the deduction problem. 
McNemar found rather varied re- 
sults when Ss were questioned about 
their use of various methods of attack 


on problems; however, the data did 
suggest that highs were better at ' se- 
lecting” among relevant and irrele- 
vant aspects of past experience. On 
water jar problems, she found no dif- 
ference between highs and lows on the 
five training problems, but taking ac- 
count of a problem found to have two 
solutions, highs solved more than 
lows, as they also did on all problems 
(including two criticals, one extinc- 
tion) combined. Highs and lows did 
not differ in susceptibility to set, but 
highs were considerably better able 
to surmount set. 

Although the studies are few, the 
results are fairly consistent. Reason- 
ing, as measured by various tests, has 
been found to be related to most 


measures of problem solving perform- 
ance. 


Motivational Variables 


Scores on the Taylor Anxiety Scale 
have been related to performance on 
a few problem solving tasks. In the 
case of simple set problems, high 
anxiety has usually been found to 
produce stronger set (Chown). 
Mayzner and Tresselt (1956) did not 
get a clear-cut relation between anxi- 
ety and set in one experiment, but in 
a second experiment the low-anxious 
group produced significantly more 
direct solutions on test problems. 

Staats (1957) found that Taylor 
Anxiety scores correlated only il 
with log time to solve the two-string 
problem. However, a comparison of 
high- and low-anxious groups woul 
have been advisable since such 
groups may differ significantly in 
performance even though the over-all 
Correlation between anxiety and per- 
ormance is low. At the same time, 
Staats’ finding of no Particular rela- 
tionship between anxiety and per- 
formance on a complex problem 
agrees with the results of Maltzman, 
Eisman, and Brooks, who found no 
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relationship between anxiety and 
performance on the two-spheres prob- 
lem. These authors also found no re- 
lationship between performance on 
the spheres problem and a test that 
presumably measured neuroticism. 
Findings with other variables here 
classed as motivational were reported 
by Rhine (1957) and Judson and 
Cofer (1956). Rhine found no rela- 
tion between anagram solving and 
scores on McClelland’s 7 Achieve- 
ment Test. In some of Judson and 
Cofer’s sets of four words, one of the 
two unambiguous words was a re- 
ligious” word, the other was not. 
Frequency of church attendance 
among Ss was significantly related to 
frequency of exclusion of the nonre- 
ligious word on some of the items. 
There seems to be some relation- 
ship between scores on the Taylor 
Anxiety Scale and performance on 
simple set problems. Research on 
other motivational variables is too 
sparse to permit generalizations, al- 
though this review does not cover 
the literature in which problem solv- 
ing tasks were used to study person- 
ality (see Chown, 1959; Levitt, 1956), 
nor studies of the effects of social 
attitudes on syllogistic reasoning. 


Other Individual Difference Variables 


Other subject variables that have 
been employed in the study of prob- 
lem solving will be mentioned briefly. 
Koyanagi (1953) and Corman (1957) 
compared groups of high and low 
mental ability. Koyanagi’s bright 
children learned to covet 4 hole in a 
path so that a ball they wer® rolling 
along the path 
through. The dull children’s set for 
rolling the ball seemed to prevent 
their learning the anticipatory Te- 
sponse of covering the hole. Cor- 
man’s brighter high school students 
benefited from large amounts of in- 
formation on how to attack Katona 


match problems, but less bright stu- 
dents were able to utilize only small 
amounts of such guidance. This dif- 
ference was even more apparent when 
information about method and about 
rule were combined; with limited 
time on each problem, less able stu- 
dents could not integrate and use 
large amounts of information or guid- 
ance. 

Differences associated with S's an- 
alytic habits have been noted by a 
few investigators. Behrens and Miles 
(1957) reported that trained observ- 
ers were consistently able to categor- 
ize Ss as analyzers or nonanalyzers on 
the basis of Ss’ verbal statements 
concerning their approach to block 
design problems. Classification as an- 
alyzer or nonanalyzer correlated .77 
in one group, (84 in a second group, 
with time to solve the problem. 
Bloom and Broder (1950) empha- 
sized analysis of the problem as part 
of their training for problem solving 
because of differences in analytic ap- 
which they had noted in com- 
paring good and poor problem solv- 
In various indirect ways, some 
dies of problem solving 


processes to 
gest the advantage 0 
Compare also Hilgard’s et al. (1953, 
1954) repeated emphasis on errors 
due to attitudes of carelessness (al- 
though this may be a motivationa 
difference), and John’s (1957) de- 
scription of the differences in ap- 
problems shown by Ss 
the natural sciences VS- 
-n other disciplines. 
(1957) study, Sve 
e differentiated in 
et on a long series 
roblems. These groups 
were then given three tests designe 
to measure variability of response. 


‘cted, the groups r 
ber of sensible re- 


sponses, but degree of set was 1- 
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versely related, in two tests, to num- 
ber of different principles used among 
responses. In other words, set was 
not related to mere variability of be- 
havior, but “discriminatory varia- 
bility” decreased as set increased. 
Subject differences are sometimes 
studied by comparing groups of Ss 
identified as good or poor problem 
solvers. Fattu, Mech, and Kapos 
(1954) differentiated such groups on 
pretest gear train problems, then 
gave both groups two kinds of train- 
ing lectures and followed each lecture 
with a test on additional problems. 
The good group remained better 
problem solvers even on the last test. 
The greatest progressive improve- 
ment over tests was shown by some, 
but not all, of the poor group. The 
good group was much better on 
a magnitude-or-error measure, and 
showed less stereotypy. The patterns 
of search behavior shown by the poor 
group became increasingly similar 
to the patterns exhibited by the 
good group, but this improvement 
was not accompanied by much in- 
crease in number of problems solved. 
Fattu et al. criticized time as a meas- 
ure in problem solving because there 
were no differences in time Scores for 
groups, tests, problems Passed, or 
problems failed, 
Fattu’s et al. finding that initial 
differences between good and poor 
problem solvers were reduced but 
were not eliminated by training, is 
an important result, and one which 
has been found, directly or indirectly, 
in a number of other Studies, e.g., 
Battig (1957), Bloom and Broder 
(1950), and John (1957). In Battig’s 
word-formation problems, high scor- 
ing Ss were shown to have more ap- 
propriate letter preferences (a power- 
ful variable in these problems) than 
low scoring Ss, who were more bound 
by alphabet order. High Ss showed 
better search patterns; they exhibited 
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response consistency at the beginning 
of a problem, variability later. „In 
contrast to Fattu et al., Battig’s 
groups were differentiated by time 
scores; high Ss took more time per 
guess. ; 
Other studies reporting various 
comparisons between groups differen- 
tiated as good or poor are: Bloom and 
Broder (1950); Engelhard (1955); 
Goldbeck et al. (1957); Hillix et al. 
(1956); Kliebhan (1955); Lawson et 
al. (1955); Moraes (1954), and 
Székely (1950b). Bloom and Broder, 
and Moraes made extensive com- 
parisons of differences in problem 
solving processes between good and 
poor groups. Engelhard and Klieb- 
han selected high and low groups of 
girls and of boys on both an intelli- 
gence test and an arithmetic problem 
solving test. The highs were signifi- 
cantly better on 15 other tests. Gold- 
beck et al. found that the half-split 
technique was of more help to high 
ability Ss, If, in Székely’s study, one 
ignores the differences attributed to 
different methods of training (which 
were not confirmed by Maltzman, 
Eisman, & Brooks, 1956), the results 
are even more clear-cut; those who 
understood the Principle underlying 
an earlier, different problem (reported 
in Székely, 1950a) produced more 
solutions of the two spheres problem. 
awson et al. found that in choice of 
alternative solutions on the test prob- 
lems, slow learners were more af- 
fected than fast learners by set in- 
duced by the training problems, but 
in the Hillix et al. experiment, where 
the test problem was related to the 
training problem in several ways, fast 
and slow learners did not differ in 
choice of solution. 
esearch employing only or pri- 
marily correlational techniques, such 
aS Psychometric and factor analytic 
studies, is not reviewed here. How- 
ever, several studies cited elsewhere 
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3 this paper also included compari- 
i usually via correlation, among 
i ject variables, among different re- 
EPRS measures, among response 
Net alg and various standardized 
man etc. A few of these comparisons 
a mentioned earlier; others, which 
a oo varied to be summarized here, 
tna Battig (1957); Frick and 
i (1957); Maltzman, Eisman, 
MeN rooks (1956); Marks (1951); 
cNemar (1955); Saugstad (1952); 
and Staats (1957). 
es summary of the several cate- 
ao of subject variables reviewed 
Sade 2 it may be said that nearly all 
few a for which at least a 
ae erences are available, have had 
ft influence on problem solving 
ee (see also studies of sub- 
eo Varii reviewed by Chown). 
thes ermore, the effects of some of 
edt variables were not always lim- 
hee mes particular kind of problem; 
en A fects tended to be somewhat 
A ra . At the same time, research 
S ae often comes out with de- 
oe | findings that are difficult to re- 
A either to each other or to the find- 
Ska of other studies. Research 
Me eled after the studies of Fattu, 
pe and Kapos (1954), McNemar 
( 955), or Milton (1957), or perhaps 
experiments based on Guilford’s 
(1956) factors, would yield more 
systematic knowledge. 


INDIVIDUAL Vs. GROUP PROBLEM 
SOLVING 


_ Several carefully done exp 
in the recent literature bear on 
question of whether groups solve 
problems better than do individuals. 
Taylor and Faust (1952); 
Tuckman, Aikman, Spiege!, and 
loss (1955a, 1955b, 1956) found 
groups superior on at least some re- 
Sponse measures. Taylor and Faust 
compared individuals, groups of two, 
and groups of four, On the game of 


eriments 
the 
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“Twenty Questions.” All Ss were 
instructed that number of questions 
was the important score. On number 
of questions and on time, twos and 
fours did not differ, but both were sig- 
nificantly better than individuals. 
Failures decreased directly from indi- 
viduals to twos to fours (all differ- 
ences significant). On an efficiency 
measure (man-minutes: number of 
persons X time), individuals were bet- 
ter than twos, twos were better than 
fours. The authors could also have 
wn that individuals took by no 
means twice as many questions as 
twos or four times as many as fours. 
Practice effects over days did not dif- 
fer as a function of the three condi- 
tions. 

The first two studies by Lorge et 
al. (1955a, 1955b) were cited earlier 
in connection with their seven meth- 
ods of presentation of the mined road 
problem. Individuals were compare 
with groups of five under all methods. 
For scoring, & content analysis was 
made of S’s written solutions an 
crucial aspects of the solution were 
weighted. There were highly signifi- 
cant differences in favor of groups 
over individuals on this “quality-of- 
solution” measure under all methods 
of presentation, with no interaction. 
Groups asked more questions than 
individuals, suggesting that group 
superiority was in part due to ob- 
taining more information. In an- 
other experiment (Lorge et al., 1956), 
only the real presentation of the prob- 
lem was used. Group superiority was 
again evident. It was also shown 
that in their written reports, groups 
tended to underestimate the quality 
of their actual solutions (as measured 
by reliable observers). 


In part, indi- 
viduals tended to overestimate their 
solutions. In thes 


e several experi- 
ments, Lorge et al. did not report an 
efficiency measure; gt 


oups were et- 
ter in over-all quality of solution, but 


sho 
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were almost certainly not five times 
better either in overall guely or in 
e component of solution. 
ECAY Pid Lambert (1952), 
Moore and Anderson (1954a), Mar- 
quart (1955), and, perhaps, Comrey 
and Staats (1955), found no evidence 
for group superiority. The McCurdy 
and Lambert problem required turn- 
ing six switches. Working individu- 
ally, S turned all switches; working 
in groups of three, each S turned two 
switches. Groups were no better than 
individuals, and leaderless groups 
were no different from groups in 
which one S gave directions that the 
others had to follow. 

Moore and Anderson first matched 
groups of three Ss with individual Ss 
on knowledge of the calculus of prop- 
ositions tasks. Over a 10-day period 
of solving problems, individuals did 

not differ significantly from groups 

on: number of problems solved, 
mean steps taken on problems, mean 
time on solved problems, mean er- 
Tors, or on two measures of repeti- 
tiousness of response. On a man- 
hour basis, individuals were almost 
three times as efficient as groups. 
Moore and Anderson had forced 
groups to agree on steps in solving, so 
one member would not dominate; 
thus, they noted that groups had to 
work as groups, a responsibility not 
saddled onto individuals. 

Marquart (1955) repeated and ex- 
panded the oft-cited Shaw (1932) 
study, using eight problems of yarj- 
ous kinds. All Ss worked on all prob- 
lems, both as individuals and as 
members of groups of three, By 
Shaw’s method, involving compari- 
son of total solutions to total possible 
solutions, groups were superior. „But 
since this method does not indicate 
whether a group solution was merely 
due to the best member, Marquart 
combined individuals into “groups 

of three. By this method, groups 
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working as groups were no better 
than groups working as individuals. 
Marquart also used her method to 
reanalyze Shaw’s data, and found 
little difference between Shaw’s 
groups and individuals. 

The Ss solved crossword puzzles 
in Comrey and Staats’ (1955) study, 
first solving individually, then in 
pairs where one S had the vertical 
code, the other the horizontal code. 
It was shown that 82% of the vari- 
ance on the group task could be pre- 
dicted from a linear combination of 
perfectly reliable high and low indi- 
vidual scores. 

The results of the preceding group 
of experiments can be fairly easily 
summarized. On “over-all” types of 
measures, groups have been superior 
to individuals on a few problems, but 
not on most problems. But where 
efficiency measures were reported, 
and also probably where they were 
not reported, individuals were supe- 
rior. 

Although theories of problem solv- 
ing will be taken up later, Lorge and 
Solomon’s (1955) Paper is of interest 
here since it deals with two models of 
group problem solving. Working 
with Shaw's (1932) data, Lorge and 
Solomon noted that some groups 
solved all the problems, some solved 
none. This suggested that group 
superiority was due to the abilities of 
members of the group, rather than to 
interpersonal interaction, So two 
ability models were proposed: (A) 
group superiority is a function only 0 
the ability of one or more of its mem- 

ers to solve the problem without re- 
gard to acceptance or rejection of 
members’ Suggestions, (B) group 
Superiority is a function only of the 
Pooled abilities of its members. 
Pooled abilities can produce solutions 
even though no member of the 
group can solve alone. Model B im- 
plies that any problem may be com- 
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posed of, and solved in, two or more 
stages, so it reduces to Model A for 
one-stage problems. Model A was 
found to be tenable for two of Shaw's 
three problems. It was also shown 
that Model A can be modified for 
stage-wise solutions. From applying 
Model B, the authors concluded that 
Shaw's data suggest, not personal 
interaction, but pooled abilities in 
two-stage problems, i.e. Model B. 
Perhaps Marquart’s (1955) meth- 
od of combining individuals into 

groups” to compare with actual 
groups, and her finding that these 
two types of groups did not differ, is 
a statistical pooling of abilities which 
produces solutions even without face- 
to-face contact. 


PROBLEM SOLVING PROCESSES 


There seems to be more concern 
with behavioral processes, aS presum- 
ably different from products, in the 
field of thinking and problem solving 
than in any other area of learning or 
performance. As compared to the 
literature before 1946, recent investi- 
gators tend more and more to report 
only products, €.8., SO many SS solved 
the problem, so many did not. Even 
so, perhaps half of all papers cited 
in this review have had something or 
other to say about processes. Obvi- 
ously, only major studies of processes 

i and these 
only very briefly. Studies of, or dis- 
cussions about, problem solving proc- 
esses are often long and extremely de- 
tailed. 

“Processes” can mean almost any- 
thing: insight vs- trial and error, Te- 
sponse variability, flexibility vs- rigid- 
ity, methods of attack, basic proc- 
esses such as perception, memory, 
intelligence, learning, etc. Other so- 
called processes are sometimes named 
and described in terms of the charac- 
teristics of the particular problems 
used in a study. This diversity Pre- 


cludes any close comparison of the 
results of different studies. Further- 
more, processes are sometimes stud- 
jed merely by giving a single group 
of Ss a problem and describing the 
Ss’ behavior in verbal or frequency 
distribution form; there may be little 
attempt to quantify processes OF to 
vary conditions. Some of the distinc- 
tion between process and product 
would disappear if more efforts were 
made to determine functional rela- 
tionships between dimensionalized 
processes and systematically varied 
conditions. 

Bloom and Broder’s (1950) reme- 
dial work with failing college students 
was based on observations of success- 
ful and unsuccessful problem solvers, 
i.e., students who did well or poorly 
on problem solving types of exam- 
inations. Detailed descriptions of 
differences in problem solving behav- 
ior, and in personality, between goo 
and poor solvers were reported. he 
Ss’ responses were classified under: 
understanding the nature of the prob- 
lem, understanding the ideas con- 
tained in the problem, general ap- 
proach to the solution of problems, 
and attitude toward problem solving. 
All of these classes revealed differ- 
ences between good and poor solvers. 
The authors also noted that good and 
poor solvers differed not so much in 
having relevant information, but in 
applying it to a problem. McNemar 
(1955) reported a somewhat similar 
finding. Bloom and Broder claime' 
that problems have a figure-groun' 
organization in that some elements 


and that some elements, 
ly figural ones, furnish 
much more than 
others. It seems likely that such fig- 
ural elements of all kinds are prime 


inducers of sets. ; 
I’s (1956) very extensive 


In Buswe 
study, over 500 Ss were © serve! 


—— ll 
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while working on various mathemati- 
cal problems, while attempting to dis- 
cover and transfer generalizations 
(this portion of the study was cited 
earlier), and while selecting from 
cards the steps and methods they 
wanted to use in solving a problem. 
Buswell reported great individual dif- 
ferences, and trial and error rather 
than systematic approaches. In no 
case were as many as 20% of the 
group represented by any one pat- 
tern of thinking; the evidence gave 
no support to any notion that prob- 
lem solving must follow precise rec- 
ipes. 
Earlier it was mentioned that John 
(1957) found rather consistent dif- 
ferences between those trained in nat- 
ural sciences and those trained in 
other disciplines on his PSI problem. 
Actually, John tested six groups, 
varying in kind and amount of educa- 
tional background, on two levels of 
difficulty of the problem, Eight work 
variables, four information variables, 
and nine approach variables were 
studied and intercorrelated, and 
changes in these patterns of behavior 
rom the simpler to the more difficult 
problem were reported. These data 
cannot be summarized here, nor can 
John’s over-all description of the 
problem solving process, Some of the 
points that were emphasized were 
that past training and experiénce 
brought about habituation of an in- 
dividual to certain kinds of concep- 
tual and organizational Processes 
which were consistently displayed, 
that some aspects of Personality were 
reflected in the problem Solving proc- 
ess, and that present level (not type) 
of academic training did not appear 
to change parameters of effectiveness 
on the problems to any great extent, 
Goldner (1957) studied ‘“whole- 
part approach” and “‘flexibility-rigid- 
ity” on six problems. Although there 


were the usual individual differences, 
intra-individual consistency was 
fairly high from problem to problem 
on the whole-part variable. Flexibil- 
ity-rigidity was also fairly consistent 
on similar tasks, but not on tasks 
that differed in structure. The two 
dimensions were separate processes 
in less structured problems, but were 
closely related in more structured 
problems. , 
Practice effects, which seem ubiq- 
uitous in other areas of performance, 
have not always been found in prob- 
lem solving. One example (there area 
few others) occurred in Bendig’s 
(1953, 1957) work on patterns of be- 
havior in solving twenty questions 
problems. Bendig’s interest was in 
the information transmitted by ques- 
tions and used by Ss. Although there 
were changes over problems in some 
of the information measures, and 
other significant effects, there was no 
learning, at least by Bendig’s method 
of measurement, over problems in 
either study. These results concern- 
ing practice effects conflict to some 
extent with those of Taylor and Faust 
(1952), but Bendig did not use the 
twenty questions game in the usual 
way, and Taylor and Faust’s work 
was conducted over a much longer 
series of problems, 
wo other major studies of prob- 
lem solving processes are those by 
Moraes (1954), and Siillwold (1954)- 
Part of Moraes’ study was cited ear- 


lier in other connections; the major” 


Part of the work is the detailed proto- 
cols of thinking processes obtained by 
comparing children who were goo’ 

vs. those who were poor ať arith- 
metic reasoning. Süllwold claimed 
that problem solution has two phasesr 
One of sudden insight, the other where 
Progress is slow, and that individual 
differences in exhibiting these phases 
were consistent from problem to 


area as unintegrated as i 
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problem. However, van de Geer 
rather thoroughly disputes Siillwold’s 
claims. 
Bis preceding studies do not ex- 
: st all that has been said in the re- 
ent literature on problem solving 
Pe (see Chown). Most of the 
i o articles to be reviewed 
ater, as well as all of the books and a 
on many of the empirical investiga- 
ti s that were cited in other connec- 
ons have included something about 
oe a Among empirical studies 
me nsive data or discussion relevant 
Pe ge appear in Ling (1946), 
— (1947), and Weaver and 
en (1949). 
ee ba reviewer's opinion, it would 
Fo ont erable to devote more effort to 
aa Sesto of functional relation- 
i environmental or task 
ie iables and performance Or prod- 
ct, rather than to problem solving 
ee In oversimplified terms, 
T of what the simple 
ph are must precede attempts to 
A ermine why and how the laws Op- 
ate, At the same time, research on 
oo would make a greater con- 
al ution if efforts were made to de- 
of ne some sort of rough classification 
g behavior patterns on which in- 
estigators could agree and which 
Toui be used in more than one 
“ udy. Possible starting points woul 
ve Bloom and Broder’s (1950) check- 
list, Guilford’s (1956) factors, etc. 


present, the chief weakness of re- 


search on behavior patterns in proD- 


lem solving is that the research area 


itself is so unpatterned. 


THEORY 


It is encouraging to find that in an 
s research on 
problem solving, there are @ number 
of good theoretical beginnings. The 
most thoroughgoing attempt in the 
recent literature to develop and test 


a theory of problem solving is that by 
Maltzman and his associates. In the 
major theoretical paper (Maltzman, 
1955), the idea of the habit family 
hierarchy, derived primarily from 
Hull, is used. The divergent, trial 
and error mechanism (one stimulus 
leading to a hierarchy of responses 
in which the correct response has low 
initial strength), and the convergent, 
discrimination learning mechanism 
(one response is led to by a hierarchy 
of stimuli in which the correct stim- 
ulus is initially low in the hierarchy), 
are combined to assume a compound, 
temporal hierarchy. Reinforcement 
or extinction of individual members 
of a hierarchy are assumed to gen- 
eralize to other members. Changes in 
order of dominance in a hierarchy, or 
among hierarchies, may ‘be produced 
by extinction of dominant incorrect 
responses or response families, by in- 
creasing the reaction potential of the 
correct response through mediated 
generalization from other reinforced 


members, Or by elicitation of frac- 
tional anticipatory goal responses. 
tion of dominant 


Concerning extinc 

incorrect responses, Maltzman 
pointed out that spontaneous recov- 
ery may occur; thus, interfering re- 
sponses may recur repeatedly while S 
is working on 4 problem. (An ex- 
ample of this apparently occurre in 
the problem used by Kay, 1954.) Me- 
diated generalization, a basic notion 
in the theory, is said to be accom- 
plished primarily by linguistic Te- 
sponses. Fractional anticipatory $04 
responses are used to interpret set; 
they are responses that are evoked by 
instructions, hints, etc. This is a val- 
uable suggestion; although sets of all 
kinds play 2? important role in many 
types of performance, they have re- 
ceived little attention from learning 


theorists. 
Failure to so. 


Ive a problem, ina- 
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bility to overcome wrong set, and 
similar phenomena, can be accounted 
for by Maltzman’s theory. He points 
out that if the correct response is low 
in the hierarchy, generalized inhibi- 
tion from repeated unsuccessful oc- 
currences of the dominant incorrect 
response may reduce reaction poten- 
tial of the correct response below the 
threshold. Also, high irrelevant 
drive, such as anxiety, will not only 
produce competing responses, but by 
increasing the total drive will multi- 
ply by all habit strengths, thereby in- 
creasing the advantage of a dominant 
incorrect response over a weaker cor- 
rect response. As noted earlier, the 
prediction concerning irrelevant 
drive has received some confirmation 
in simple set problems (Chown, 1959; 
Mayzner & Tresselt, 1956). (Predic- 
tion of the effect of irrelevant drive 
on other problems is difficult; a dom- 
inant incorrect set should be in- 
creased in strength, but the simul- 
taneous occurrence of competing re- 
Sponses might facilitate solution by 
increasing response variability.) 
Other predictions from Maltzman’s 
theory, and some expansions of the 
theory, appear in his several empiri- 
cal studies, cited earlier. The theory 
does not come to close grips with such 
tasks as the two-string problem, but 
it is still one of the most fruitful the- 
ories yet offered in problem solving. 
Borrowing from Maltzman and 
others, Cofer and his associates 
(Cofer, 1957; Judson & Cofer, 1956; 
Judson, Cofer, & Gelfand, 1956) em- 
phasize particularly the role of verbal 
responses as mediators -in response 
hierarchies. The several experiments 
(reported in the two 1956 papers and 
cited earlier) deal either with varia- 
bles that produce changes of dom- 
inance in verbal response hierarchies, 
or with the effects of such changes on 
problem solution. The former type of 
experiment was quite successful; the 
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latter was fairly successful, Although 
Cofer deals mostly with hierarchies 
among verbal responses, as does 
Maltzman, there are also, of course, 
hierarchies among instrumental re- 
sponses. Staats (1957), borrowing 
from Osgood (1953), developed his 
experiment on the basis of possible 
relationships between verbal and in- 
strumental hierarchies associated 
with the same stimulus object. 

A phenomenological theory of 
problem solving has been presented 
in some detail by van de Geer (1957). 
Briefly, different aspects of the same 
object may appear in perception; 
therefore, situations vary in degree 
of “transparency.” In thinking, 
other aspects of the situation must 
be explicated, thereby reducing the 
nontransparency of the situation. In 
connection with this theory, van de 
Geer makes a worthwhile effort to 
classify problems, His major cate- 
gories are three “points of view” 
toward problems: in what way does S 
try to solve, what is the nature of the 
difficulty of the problem, what is the 
nature of the initial and the goal 
situations. Each point of view pro- 
vides, to some extent, a classification 
of problems, For example, the dis- 
tinction sometimes made between in- 
sight problems and trial and error 
Problems appears, in other terms, 
under “nature of the difficulty.” s 

Saying that a phenomenological 
theory does not lead directly to a pro- 
gram for experimental research, yan 
de Geer goes on to present an axl0- 
matic approach to problem solving, 
based on game theory and informa- 
tion theory. He shows how the model 
handles each of the types of problem 
listed under his “nature of the diffi- 
culty” category. In this connection 
van de Geer claims that S's intelli; 
gence and “thinking-out capacity 
determine how difficult S will find 2 
Problem to be, and therefore whether 
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A ‘ey show insight or trial and error. 
rake attempt to reduce insight and 
an a error to a single underlying 
Gal ciple has some similarity to 
os and Gerstenhaber’s (1956) 
T ing of the two patterns of behav- 
avan de Geer’s point that phenom- 
ne gical theory does not easily gen- 
Me F experimentally testable hy- 
eae aso ane to ae other 
lists ct s criptions of processes, 
HER E eps toward solution, etc., 
of te as been offered in the area 
ea nking and problem solving. In 
ae apes Underwood (1952) 
Beant sa combination of theory and 
fee ation toward research that di- 
Hike y pyeecae manipulatable varja- 
mei 78 begin with, thinking, includ- 
EA p ' h concept formation and prob- 
pi E ving, is said to be the learning 
ea recognizing (discovering?) of 
oe or functional relationships 
oben stimuli. Stimuli may include 
shi S, symbols, or other relation- 
A (as in_syllogisms). The basic 
ia Paes is that for the perception 
ae ationships among stimuli to 0c- 
thes the appropriate responses to 
The stimuli must be contiguous. 
or reviewer would interpret this 
Set to mean that when pre- 
Aras separately, S: and Sz lead, oF 
Sin made to lead, to the same Ri. 
BE ce both stimuli lead to the same 
fe there is a relationship be- 
nes them which, however, will not 
K cessarily be perceived unless. they 
are presented in such a way that 
both” Ris occur contiguously- 
Pea ional mechanism could also, be 
aonde: the first Ri produces stim- 
c i, traces of which overlap with oc- 
urrence of the second Rı- 
f Whether or not this is a correct in- 
pei of Underwood’s basic 
Berean it is clear that manipu- 
h able variables in thinking are those 
actors that increase Or decrease re- 


sponse contiguity. Underwood men- 
tions such factors as mode of presen- 
tation of stimuli, number and sim- 
ilarity of stimuli, several kinds of 
biases, and memory. He points out 
the importance of response hier- 
archies, and indicates that the theory 
leads to a number of predictions. One 
of the predictions is that massed 
practice, by preventing forgetting, 
may be better than distributed prac- 
tice in thinking. The literature bear- 
ing on this point is conflicting. Al- 
though Underwood’s theory is not as 
easily applied to some types of com- 
plex problems as it is to other types 
or to concept formation, it is more di- 
rectly tied to a basic process (contig- 
uity) than is any other theory, and is 
one of the best single sources for re- 
search hypotheses. 

Except for van de Geer’s phenom- 
enological theory, all of the theories 


discussed so far are S-R behavioristic 


types. Unfortunately, much less has 
been done to expan Gestalt theory. 


Distinctions between productive an 


reproductive t h i 
the role of insight, and experiments 
on functional fixedness, explication 


al, and water jars set, all 
stem more oF less from Gestalt ori- 
gins. Humphrey (1951), and van de 

(1957) indicate strengths and 


Geer 
weaknesses of Gestalt theory, and 
But only 


Saugstad (1957) rejects it. 

Helson and Helson (1946) have made 
a serious attempt to generalize the 
theory to new situations. Their ap- 
proach is to show that configura- 
tional principles also apply to ab- 


stract, sym s well as 
to Wertheimer’s (1945) geometric 


They g9 throu 


tion) in ‘ 
rather than by tria and error oF DY 
e tof h gh-level mathematica 

hat reorient 
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ing the equation by use of new sm 
bols aids solution, and this reorient- 
ing is said to be the same process that 
operates in geometrical prob or 
in perceiving hidden figures. (Others 
have directly attempted to relate re- 
organization in certain problems to 
scores on hidden figures tests: see 
Chown, 1959; Frick & Guilford, 
1957.) Helson and Helson consider 
that substitution of symbols or of new 
symbols is the distinguishing mark of 
abstract thinking, and that it is fre- 
quently desirable to replace concrete 
features with symbols since symbols 
are easier to manipulate and also tend 
to suggest new combinations. This 
point may have some relation to the 
“concreteness” studies reported ear- 
lier; if Ss do tend to replace concrete 
features with symbols, the failure of 
most of the concreteness studies to 
find any difference among various 
perceptual or symbolic modes of 
problem presentation would be un- 
derstandable. Indeed, Helson and 

Helson conclude that no sharp line 

can be drawn between concrete 

symbolic procedures; most individ- 
uals use both in actual thinking, 

Gestalt theory, with its emphasis 

on reorientation within a problem, 
also bears some relationships to stud- 
ies of “variation among problem ele- 
ments,” and to studies concerned 
with ‘‘methods of understanding.” It 
is possible that Ss could be trained in 
reorienting as a method of under- 
standing, and that such a skill would 
transfer to a wide variety of prob- 
lems. 

Other theoretical contributions 
may be mentioned briefly. Flavell 
and Draguns (1957) hold that both 
thought and perception undergo a 
very brief but important microde- 
velopment. Their suggestions deal 
with a matter to which too little at- 
tention has been paid, viz., the sets 
that are instantaneously induced 


and 


upon initial perception of a problem. 
Stolurow, Bergum, Hodgson, and 
Silva (1955) present a probabilistic 
model of trouble shooting. The prob- 
ability that each of several defects 
may be causing malfunction in air- 
plane engines and the time to repair 
each defect are combined in a ratio 
to indicate which order of checking 
defects one should follow for most 
efficient repair. It seems likely that 
the same sort of model could be 
worked out for other complex appa- 
ratus problems such as Fattu, Mech, 
and Kapos’ (1954) gear train. Hum- 
phrey (1951), Johnson (1955), and 
Weaver and Madden (1949) all make 
several points relevant to the de- 
velopment of problem solving theory. 
Mayzner (1955) first develops pre- 
dictions from theories of Hull, Wer- 
ner, and an earlier theory of Under- 
wood, then shows how each theory 
fared in comparison to his data. 

Several of the prototheories re- 
viewed here seem promising. How- 
ever, they have not yet been directly 
followed up by much experimenta- 
tion, and those who do experimental 
work have made little effort to relate 
their results to what theory is avail- 
able. This lack of rapprochement be- 
tween existing theory and existing 
data is another one of the reasons 
why the area of problem solving 
shows lack of integration. 

Some further points may be made 
with regard to theory in problem 
Solving. First, problem solving 19 
human adults is to a considerable ex- 
tent a matter of transferring past- 
learned skills and responses to the 
immediate problem situation. In one 
Way or another this fairly obvious 
Point has been implied by many in- 
vestigators, even those who hold to & 
distinction between productive and 
reproductive thinking. Yet relatively 
little use has been made of existing 
transfer theory or data. For example, 
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many problems can be interpreted as 
negative transfer situations.” Some 
of the variables of which negative 
transfer is a function are known from 
studies of other types of human learn- 
ing and performance, both verbal and 
motor. These studies provide many 
suggestions for research, and to some 
extent for theory, in problem solving. 
Negative transfer is merely an ex- 
ample. A thoroughgoing transfer ap- 
proach to problem solving could also 
make use of much that is known 
about positive transfer, including the 
possibility of “learning to think” 
(Harlow, 1949; Underwood, 1952; 
Weaver & Madden, 1949). 

Second, except for Underwood's 
(1952) paper, and perhaps a few sug- 
gestions by Bruner, Goodnow, and 
Austin (1956), almost nothing has 
been done to relate, theoretically or 
experimentally, the area of problem 
solving to the large literature on con- 
cept formation. Yet the initial dis- 
covery of relevant among irrelevant 
dimensions in concept formation is 
probably not basically different from 
discovery of the correct solution in 
problem solving. Both Riley (1952) 
in problem solving, and Richardson 
and Bergum (1954) in concept forma- 
tion, have recognized separate dis- 
covery and fixation phases in per- 
formance. The discrete S-R problems 
used by several investigators (e.g-5 
Brush, 1956; French, 1954; Marx et 
al., 1956; Noble, 1955; Ray, 1957; 
Riley, 1952) can probably be modi- 
fied to vary continuously from “pure” 
concept formation to “pure” problem 
solving tasks. 

Finally, despite what has been said 
above concerning theory, the review- 
er’s position is the same as that of 
Ray (1955) and Underwood (1952). 


2 Several illustrations of this statement were 
Pointed out to the reviewer by Rudolph W. 
Schultz, whose suggestions were of consider- 
able help. 
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These authors emphasize that al- 
though theoretical developments are 
not necessarily unwelcome, the basic 
need in problem solving is experi- 
mental determination of the func- 
tional relationships between dimen- 
sionalized independent variables and 
problem solving performance. 


CONCLUSIONS 


The following conclusions are sug- 
gested. Problem solving in human 
adults is a name for a diverse class of 
performances which differs, if it dif- 
fers at all, only in degree from other 
classes of learning and performance, 
the degree of difference depending 
upon the extent to which problem 
solving demands location or integra- 
tion of previously learned responses. 
Problem solving performance varied 
most clearly as a function of simple 
sets, of a few kinds of complex sets, of 
changes in the relationships among 
elements of a problem, of level of 
ficulty, of aids toward so- 


problem dif ds 
lution, and of certain characteristics 
and 


of the subject, especially sex, age, 
reasoning ability. The variables that 
influence simple sets were largely 
those that affect performance, and 
that affect performance in similar 
ways, in other situations. Individual 
differences in problem solving profi- 
ciency appeared to be relatively 


stable. 
Problem solving was usually, 
though not always, unaffected by 


differences in the degree of concrete- 
ness or abstractness of versions of the 
same problem. Other variables and 
conditions either yielded conflicting 
results, or more commonly, were em- 
ployed in too few studies to warrant 
a conclusion. 

Groups produced more or better 
solutions to some problems than did 
individuals; on most problems there 
was no difference. Individuals were 
superior to groups on measures of 
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efficiency. Research on problem SNe 
ing processes revealed very sere 
patterns of behavior. Problem solv- 
ing theories that show some promise 
are beginning to be developed. : y 
The field of problem solving is 
poorly integrated. The reasons for 
this seem to be the use of a great vari- 
ety of tasks to provide problems, the 
frequent use of unanalyzed and non- 


dimensionalized variables, the lack 
of an agreed-upon taxonomy of be- 
havioral processes, and to some ex- 
tent the failure to relate data to 
other data or to theory. Problem 
solving particularly needs research to 
determine the simple laws between 
dimensionalized independent vari- 
ables and performance. 
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TOWARD A THEORY OF PAIN: 


NTA EUCOTOMY, 
OF CHRONIC PAIN BY PREFRONTAL LE 
aan OPIATES, PLACEBOS, AND HYPNOSIS 
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The response to a nociceptive stim- 
ulus normally includes at least four 
components: ‘‘the sensation of pain’; 
discomfort; withdrawal movements; 
and ‘some measurable physiological 
alteration, e.g., a transient or pro- 
longed increase or decrease in blood 
pressure (Nafe & Wagoner, 1938; 
Goetzl, Bien, & Lu, 1951). This 
paper is concerned with the neuro- 
logical correlates of this total re- 
sponse—hereafter termed the pain re- 
sponse—and how this total response 
or some components of this response 
can be mitigated or eliminated by 

prefrontal leucotomy, opiates, place- 

bos, and hypnosis 


THE NEUROPHYSIOLOGICAL Cor- 
RELATES OF THE PAIN RESPONSE 


Free Nerve Endings: The So-Called 
“Pain Receptors” 


It has generally been assumed that 
the free nerve endings, which 
found widely scattered ne 
taneous and viscera] surfaces, are the 
specific receptors for noxious stimuli, 
However, Sinclair, Weddell, and 
Zander (1952) have shown that Ss 
can discriminate cold, heat, touch, 


* Postdoctoral research fellow, National 
Institute of Mental Health, Public Health 
Service. It is a pleasure to thank E, G. Boring, 


Asenath Petrie, H. K. Beecher, and DAR 


Evans for the many hours they spent criti- 
cally reading this manuscript, Although their 
valuable advice and suggestions led to the 
correction of many errors, responsibility for 
the remaining faults and the opinions ex- 
pressed remain with the writer. At present 
the writer is a research associate at the Wor- 
cester Foundation for Experimental Biology 
and Medfield (Massachusetts) State Hospital. 


430 


and prick just as well from the ear 
pinna, which contains only bare 
nerve endings and a basketlike net- 
work around the hair follicles, as they 
can from the skin of the forearm, 
which contains all of the encapsu- 
lated endings which have been de- 
scribed. Lele, Weddell, and Williams 
(1954) have demonstrated that the 
free nerve endings in the skin, when 
suitably stimulated, “give rise to a 
wide range of sensory experience 
which includes reports of ‘cold,’ 
‘touch,’ ‘warm,’ ‘prick,’ ‘itch,’ and 
‘sharp pain.’ Lele and Weddell 
(1956) have confirmed earlier findings 
(e.g., Nafe & Wagoner, 1936) that Ss 
report not only pain but also touch, 
warmth, and cold when appropriate 
stimuli are applied to the center of 
the cornea which contains only free 
endings. These and other investiga- 
tions recently reviewed by Weddell 
(1955) and Sinclair (1955) indicate 
that a wide variety of sensory experi- 
ences can be evoked by suitable stim- 
ulation of the free nerve terminals 
and that theories of cutaneous sensi- 
bility Postulating specific receptors 
or each sense modality are open to 
Serious objection at the present time.” 


Peripheral Conduction 


A series of earlier investigations, 
reviewed by Bonica (1953, pp. 29- 


2 The possibility remains that the term free 
nerve endings does not refer to homogeneous 
units. If future investigations demonstrate 
specific biochemical differences between these 
endings, the question of specific receptors for 
the various sense modalities may require fur- 
ther Investigations at the molecular level. 
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30), had apparently shown (a) that 
asphyxia or pressure applied to a per- 
ipheral nerve blocked the large, my- 
elinated, fast-conducting A -fibers 
first and abolished touch before pain 
and (b) that cocaine blocked the 
small, unmyelinated, slow-conduct- 
ing C fibers first and abolished pain 
before touch. This appeared to be 
satisfactory evidence that touch is 
correlated with conduction in the 
larger A fibers and that pain is cor- 
related with conduction in the small 
C fibers.? Other data, however, indi- 
cate that noxious stimuli applied to 
the cutaneous surface activate many, 
if not all, of the fiber types present 
in the cutaneous nerves, viz., the 
smaller A fibers and the C fibers. 
Since the large A fibers are present 
only in the muscle branches of the 
nerves and the Group B fibers consist 
of sympathetic preganglionic axones 
(Lloyd, 1955; Ruch, 1955, p. 334), 


3 At the present time, extreme caution is 
necessary in drawing conclusions from the 
earlier experiments on reversible nerve blocks 
produced by asphyxia, compression, cocaine, 
etc. (Jones: 1956, 1958; Schiller, 1956). Ina 
series of carefully controlled studies of nerve 
blocks produced by procaine, compression, 
and cooling, Sinclair and Hinshaw (Sinclair, 
1955) have demonstrated that it is possible 
to obtain almost any order of sensory loss by 
using different Ss, by varying the site stimu- 
pet and by altering the nature of the stimu- 
us, 

The question of “double pain” and its re- 
lation to conduction in A-delta and C fibers 
has also been opened for further inquiry. 
“Double pain” may be due to inadequate con- 
trol of the stimulus at the receptor level: 
Jones (1956) demonstrated that (physio- 
logically normal) Ss do not report double pain 
if the stimulus is prevented from stimulating the 
same receptors more than once. Sinclair (1955, 
p. 594) has also concluded from his own work 
and from earlier investigations in this area 
that “the question of second pain cannot be 
regarded as settled and the idea of two sets of 
pain fibres rests upon work which is not im- 
mune to criticism of the experimental findings 
as well as the interpretations placed upon 
them,” 
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they cannot be directly activated by 
cutaneous stimuli. Heinbecker, 
Bishop, and O’Leary (1933) demon- 
strated with a human subject that 
electric shock applied to an exposed 
nerve in such a manner as to stimu- 
late the A-delta fibers consistently 
evoked a pain response. Zotterman 
(1939) reported that a burning stim- 
ulus applied to the skin of cat evoked 
a spike composition that included 
both the A-delta and C fibers. Brook- 
hart, Livingston, and Haugen (1953) 
demonstrated that stimulation of the 
tooth pulp (which normally evokes a 
pain response) yields conduction 
characteristics of the A gamma-delta 
fibers. After reviewing these and 
other investigations attempting to 
relate the modalities of sensation to 
conduction in specific fibers, Living- 
ston (1943), Bonica (1953), Sinclair 
(1955), and Schiller (1956) agree with 
Gasser (1943, p. 59) that “the fibers 
belonging to different modalities 
must be widely distributed through- 
out the various fiber sizes, and that 
there seems to be little possibility of 
associating any one sensation with an 
elevation in the electroneurogram.” 

If all cutaneous stimuli activate 
“fibers widely distributed throughout 
the various fiber sizes,” what deter- 
mines the differential response to 
each stimulus? To account for this 
differential response, investigators in 
this area (Bishop, 1946; Weddell, 
1955; Sinclair, 1955) hypothesize 
that each cutaneous stimulus sets off 
a pattern of nerve impulses which 
differs from the pattern set off by 
other stimuli in that the relative num- 
ber of activated fibers of various sizes 
differ, and the relatively different 
sizes carry impulses differing in en- 
ergy value, frequency, and duration. 
A light touch, for example, preferen- 
tially activates the larger fibers. 
However, this does not mean that 
these large fibers are specific to light 
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touch stimuli. The larger the fiber 
the lower its threshold. _The slight 
disturbance caused by light touch, 
therefore, activates the largest fibers 
with the lowest threshold more readily 
than the smaller fibers with higher 
thresholds. Similarly, nociceptive 
stimuli applied to the cutaneous sur- 
face may more readily activate fibers 
in the C range, but this does not 
mean that these stimuli do not also 
activate other cutaneous fibers and 
it also does not mean that other 
stimuli cannot activate the C fibers. 


Conduction in the Spinal Cord 


Peripheral nerve fibers, which are 
both myelinated and unmyelinated 
and which may belong to either Class 
A or C, travel along th 
spinal, or sympathetic nerves to 
Posterior root ganglia where they 
der neurons, 
view is that 
0 the viscera 
and cutaneous 


‘omplete story, 
y Cases, cutting 


amic tract pre- 
vents a response to Many nociceptive 


stimuli applied to the contralateral 
side. However, in some cases, this 
loss is temporary; normal pain re- 
sponsiveness may return after an in- 
tervening period (Ranson, 1943, p, 
111). Also, this operation (antero- 
lateral cordotomy) does not abolish 
pain discrimination while leaving 
temperature, touch, and pressure dis- 
crimination intact. As Schiller (1956, 
p. 208) points out, “One modality or 
two are never either completely 
spared or abolished to the absolute 


exclusion of the others . . . parts de- 
nervated by anterolateral cordot- 
omy are reported as feeling ‘numb, 
discrimination of two points and tex- 
ture of materials is diminished, and, 
in addition, there are thermanes- 
thesia and analgesia.” In addition, 
White and Sweet (1955, p. 45) dem- 
onstrated that a current at 100 or 
more volts applied to the “analgesic 
side” invariably produced a report of 
severe pain in all (40) patients ex- 
amined. King (1957) confirmed this 
finding and, in addition, found that 
(after anterolateral cordotomy) the 
maximum elevation of the pain 
threshold on the “analgesic side” did 
not exceed 40 to 50%, Since, in the 
great majority of cases, the pain 
threshold elevation Was much less 
than this maximum, and, in some 
cases, was not significantly different 
from the Pain threshold on the nor- 
mal side, King concluded that “a 
Polysynaptic relay pathway for pain- 
ful stimuli in man, aside from the 
spinothalamic system, seems prob- 
able.” 

Not only does anterolateral cord- 
otomy consistently fail to abolish the 
response to more intense noxious 
stimuli, it also fails to affect the pain 
n the majority 
and Sweet (1955, p. 
, after this operation, 
tients consistently re- 
hen multiple rapid pin- 
Pplied to the “analgesic 


60% of their pa 
Ported pain w 
pricks were a 
side,” 


French and Peyton (1948), Voris 
(1951), and White and Sweet (1955, 
P. 45) have Presented additional evi- 
dence indicating that nociceptive 
stimuli activate fibers that do not 
cross to the opposite side in the spinal 
cord. Each of these investigators has 
reported cases in which the “anal- 
gesia” was ipsilateral following anter- 
olateral cordotomy. From these re- 
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ports White and Sweet (1955, p. 275) 
conclude that the “pain fibres are 
diffused over a very wide area of the 
anterolateral quadrant, and that at 
times some centrally conducting 
fibres must run upwards in the ipsi- 
lateral as well as in the contralateral 
columns, at least for a considerable 
distance.” 

Even the above account is incom- 
plete; nociceptive stimuli can acti- 
vate far more units in the cord than 
those found on both sides of the an- 
terolateral quadrant. Livingston 
(1943) and White and Sweet (1955) 
have found that in many cases bilat- 
eral anterior cordotomy is insufficient 
to relieve a pain syndrome and 
Lhermitte and Puech /1946), Pool 
(1946), and Browder and Gallagher 
(1948) have demonstrated that some 
pain syndromes can be relieved by 
posterior cordotomy. Keele (1957, p. 
164) has reviewed additional evi- 
dence which indicates that the “pain 
tracts’ may be widely dispersed in 
the cord and concludes that ‘‘one is 
induced to look upon their anatomy 
as one of statistical probability, and 
to wonder how closely or how perma- 
nently the function of transmission 
of pain sense is attached to fixed 
neuronal paths in the cord.” After 
summarizing the evidence in this 
area, Adey (1957) similarly concludes 
that the concept of localized fiber 
Pathways in the spinal cord carrying 
particular types of sensory impulses 
is open to serious question and re- 
quires revision. 

A number of investigators have 
interpreted their data as indicating 
that the same stimulus may activate 
different neural units in the cord at 
at different times. Gasser (1937) 
writes that “a given stream of afier- 
ent impulses over a peripheral nerve 
follows one pathway in the centers 
at one time and another pathway at 
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another time. The direction of the 
switching is conditioned by the situa- 
tion obtaining at the moment, and is 
always consonant with a coordinated 
reaction of the whole organism.” 
Along similar lines, Livingston (1943, 
p. 25)-interprets the evidence as indi- 
cating that “impulses, finding them- 
selves blocked from their customary 
pathways, eventually find new or pre- 
viously unused pathways.” Bishop 
(1944) likewise suggests that when 
impulses along a neural pathway 
reach a certain critical frequency, 
they are “switched” to different con- 
duction units from those into which 
they normally pass. 

A number of other considerations 
should be emphasized. First of all, 
there is no need to hypothesize spe- 
cific pathways for pain and other 
modalities to understand the altera- 
tions in sensibility which follow an- 
terolateral cordotomy. As Sinclair 
(1955, p. 606) has pointed out, “In- 
stead of cutting specific fibres, we 
may be so altering the sensory pat- 
terns the spinal cord is capable of 
conducting in such a way as to lead 
to a sensory dissociation.” Further- 
more, even if we assume a “segrega- 
tion” of “pain-conducting fibers” at 
the cord level, we cannot relate this 
“segregation” to the “sensation of 
pain,” to discomfort or suffering, or 
to other components of the pain re- 
sponse which appear to require 
higher neurological levels. Whatever 
“segregation” of fibers occurs at the 
cord level can be related only to re- 
flex functions at this level and to 
nothing more (Bishop, 1946). It 
should also be noted that afferent 
impulses in the cord can be altered 
by impulses descending from the 
brain stem and cerebrum. Hagbarth 
and Kerr (1954) have demonstrated 
that afferent volleys in the anterior 
columns of the spinal cord are re- 
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in amplitude by electrical 
eae of the bulbar and mid- 
brain reticular formation, of the pre- 
central and postcentral gyri, and of 
various other forebrain structures. 


Conduction at the Brain Stem Level 


There seemed to be general agree- 
ment, just a few years ago, that spi- 
nothalamic pathways “carried Pain” 
without interruption through the 
medulla, pons, and midbrain to the 
posterolateral ventral nucleus of the 
thalamus. Recent evidence indicates 
that this also is an incomplete ac- 
count. First of all, there is little 
doubt that the great majority of the 
fibers from the anterolateral funicu- 
lus of the cord terminate at levels 
below the thalamus (Walker, 1940; 
Walker, 1943; Bowsher, 1957). Fur- 

thermore, an extensive series of in- 
vestigations, recently summarized by 

Magoun (1958), indicate that the 


“classical Sensory pathways” (includ- 
Ing the “pain” 


of the brain stem (and to the “dif- 
fusely” projecting thalamic nuclei) 
and that appropriate electrical stimu- 


ation of elec- 
roughout wide areas 
ch as is seen in the 


r n in the normal ani- 
mal. In line with this evidence, it 


has been demonstrated that anes- 
thetics exert their Primary effect in 
blocking the response to noxious 
stimuli (as well as other stimuli) by 
preventing conduction through the 
reticular area of the brain stem 
(French & King, 1955), French, 
Verzeano, and Magoun (1953) re- 
ported that both sodium pentobarbi- 
tal and ether depress conduction 
through the reticular formation while 
the direct afferent pathways con- 
tinue to conduct impulses in normal 


manner. Similarly, Arduini and 
Arduini (1953), Peterson (1955), and 
Haugen and Melzack (1957) found 
that the potentials in the reticular 
formation were much more suscepti- 
ble to procain, nitrous oxide, and 
other drugs than the potentials in 
the direct afferent pathways. 

In summary, recent evidence seems 
to be consistent with Melzack, Stot- 
ler, and Livingston's (1958) conclu- 
sion from their study of brain stem 
lesions in the cat: 

Whatever the nature of pain perception 
may be, its neural substrates appear to be 
much more complex than that envisaged ina 
single ascending system, The patterns of im- 
pulses subserving pain appear to travel over 
multiple pathways at the brainstem level at 
least, and the ultimate perceptual event seems 


to depend upon activities occurring along all 
of these pathways (p, 365). 


Conduction at th 


e Thalamo-Cortical 
Level 


It has generally been assumed that, 
after Synapsing at the posterior ven- 
tral nucleus of the thalamus, “pain 
fibers” Project to the Postcentral con- 
volution of the Cortex. However, as 
Walker (1943) has Pointed out, the 
Posterior ventral nucleus has numer- 
ous connections with the adjacent 
thalamic nuclei and impulses can be 
conveyed to wide areas of the cere- 
bral cortex in this indirect fashion.4 

Iso, as Pointed out above, nocicep- 
tive stimuli applied to visceral, 
Somatic, and cutaneous structures 
also activate neurons in the reticular 
formation which sends impulses to 
many cortical areas over both tha- 
lamic and extrathalamic pathways. 
In line with these considerations, 


4 Murphy and Gellhorn (1945) report that 
strychninization of this thalamic nucleus also 


ting in the ipsilateral and contra- 
lateral hypothal 


Pinky 


ea 
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Gellhorn and Ballin (1946) report 
that noxious stimuli applied to the 
periphery of narcotized animals alter 
electrical activity throughout the en- 
tire cortex, and Benjamin and Ivy 
(1949) report that noxious stimuli 
applied to the extremities of human 
Ss evoke a nonspecific decrease in 
amplitude of the waves from the 
parietal, occipital, temporal, and 
frontal areas. 

, In general, the evidence summar- 
ized below indicates that “adequate” 
stimulation of the cerebral cortex 
may elicit reports of pain; and that 
damage to a number of cortical areas 
may affect “the sensation of pain” 
and the withdrawal movements 
which normally follow noxious stimu- 
lation. 

In rare instances, electrical stimu- 
lation of the cerebral cortex, espe- 
cially of the precentral, postcentral, 
and superior parietal gyri, has been 
followed by “a sensation of pain” 
localized in the face, limbs, trunk, or 
other body area (Penfield & Boldrey, 
1937; Horrax, 1946; Lewin & Phil- 
lips, 1952). Also, in a few patients, 
destruction of the postcentral, supe- 
rior parietal, superior temporal, and 
insular convolutions (Davison & 
Schick, 1935); or tumors in the pari- 
etal lobe alone or in the parietal plus 
the frontal or occipital lobe (Michel- 
son, 1943); or tumors in the right or 
left parietal, frontal, and temporal 
areas (Bender & Jaffe, 1958); have 
been reported to give rise to ‘‘spon- 
taneous pain” referred to various 
body areas. However, we cannot 


5 Although wide ares of the cerebral cortex 
are normally activated by peripheral nocicep- 
tive stimulation, it is quite certain that some 


of the components of the pain response can be, 


carried out by animals lacking this structure. 
In pontile cats, for example, Bard and Macht 
(1958) report that a strong nociceptive stimu- 
lus elicits growl-like vocalizations, protrusion 
of claws, running movements, piloerection, 


and increased respiratory and cardiac activity- 


conclude from these reports that the 
cerebral cortex ‘“‘subserves” some spe- 
cial function in the “perception of 
pain.” Electrical stimulation of 
many other neural tissues also, at 
times, evokes referred ‘pain sensa- 
tions” (Sweet, White, Selverstone, & 
Nilges, 1950). Also, a series of in- 
vestigations, summarized by Bonica 
(1953, p. 131), White and Sweet 
(1955, pp. 526-528), and Bender and 
Jaffe (1958), indicate that pathologi- 
cal processes in the spinal cord, the 
brain stem, and the thalamus may 
also produce “‘spontaneous pain” in- 
distinguishable from that produced 
by lesions or tumors in the cortex. 
Furthermore, the pain response which 
follows electrical stimulation or path- 
ological processes in the cortex could 
possibly be integrated by subcortical 
structures. Finally, it should be em- 
phasized that reports of pain follow- 
ing electrical stimulation or lesions 
in the cerebral cortex are so rare that 
Penfield and Rasmussen (1950, p- 3) 
conclude that “the thalamus retains 
the problem of disposing of pain im- 
pulses without calling on the cortex 
for essential help.” 

In rare instances, cortical lesions 
have been reported to prevent “the 
sensation of pain” or “the recogni- 
tion of the stimulus” without affect- 
ing other components of the pain re- 
sponse. Gilliatt and Pratt (1952) 
have reported that after a “right- 
sided cerebral thrombosis” a patient 
did not “consciously recognize” noxi- 
ous stimuli applied to the left side of 
the body, even though the stimuli 
gave rise to general restlessness, 10- 
creased blood pressure, tachycardia, 
deepening of respiration, and dilata- 
tion of the pupils. Marshall (1951) 
has also published 11 cases of left and 
right parietal lesions which were fol; 
lowed by a deficient ‘pain sensation 
when pinprick, or intravenous injec- 
tion of hypertonic sodium chloride, 
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lied to areas contralateral to 
the on, These reports are excep- 
tional; localized cortical lesions are 
not usually followed by alterations in 
“the sensation of pain.” Penfield 
(cited by White & Sweet, 1955, P- 
109), after wide experience with corti- 
cal ablations, states that he has 
“never seen a patient who had a 
parietal lesion lose sensation of pain 
excepting for a few hours or days fol- 
lowing excision.” White and Sweet 
(1955, p. 63) similarly conclude after 
reviewing the evidence that “studies 
in man following localized cortical 
extirpations reveal little reduction of 
pain sensation upon peripheral stimu- 
lation, and confirm the huge extent 
of the cortex concerned with sensa- 
tion.” 

Damage to a number of cortical 
areas may affect the purposive with- 
drawal movements which normally 
follow nociceptive stimulation, 
Schilder and Stengel (1931) pub- 
lished a study of 3 patients with 
tumors or lesions of the left parietal 
lobe (with additional lesions, in two 
of the Cases, in the frontal or tem- 
poral lobe) who did not withdraw from 
noxious stimuli, threatening gestures, 
loud noises, or sudden flashes of light, 
Similarly, Hemphill and Stengel 
(1940) reported that a Patient with a 
probable lesion of the left labyrinth 
failed to show withdrawal responses 
to noxious stimuli and to unexpected 
sounds, Although the patient “ad- 
mitted that he could feel the painful 

stimulus” and that he could hear an 
automobile horn, he failed to show 
withdrawal or defense reactions when 
a match was struck close to his eyes 
and when an automobile horn threat- 
ened his life. Rubins and Friedman 
(1948) have published a similar study 
of four patients with lesions in or 
around the supramarginal gyrus of 
the dominant hemisphere who showed 
a lack of withdrawal to noxious 


stimuli and to threatening gestures 
even though they “felt” pain and 
were aware of the threatening char- 
acter of the gestures. The latter in- 
vestigators emphasize that only cer- 
tain motor withdrawal reactions ap- 
pear to be normally integrated in or 
around the damaged areas. 
Although specific unilateral lesions, 
in some instances, result in deficien- 
cies in pain responsiveness, it by no 
means follows that the more exten- 
sive the unilateral lesion, the more 
deficient the response. On the con- 
trary, removing either the right or 
left cerebral hemisphere either does 
not seriously affect the response to a 
noxious stimulus or alters only the 
response to lower intensities of stimu- 
lation. Dandy (1933) reported two 
cases of extirpation of the right cere- 
bral hemisphere; in both patients the 
response to a pinprick on the contra- 
lateral side was seriously deficient, 
but movements of joints and com- 
pression of muscles on either side of 
the body brought forth a pain reac- 
tion with an intense “feeling” com- 
ponent. Gardner (1933) found that 
20 months after right hemispherec- 
tomy firm pressure with a pin (on the 
contralateral or ipsilateral side) was 
recognized as “painful.” Zollinger 
(1935) reported that after removal of 
the left cerebral hemisphere (in a 
right-handed woman), the patient 
showed “acute pain with motion of 
the joints or compression of the deep 
muscles.” Rowe (1937) stated that, 
after removal of the right hemi- 
sphere, his patient responded nor- 
mally to nociceptive stimuli applied 
anywhere on the ipsilateral side and 
to scattered areas on the contralat- 
eral side. Somewhat in contrast to 
the above are the later reports of 
Evans (cited by Walker, 1943), Bell 
and Karnosh (1949), Krynauw (1950) 
—12 patients—, and Marshall and 
Walker (1950)—4 patients: a few 
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months after hemispherectomy, most 
of their patients showed normal pain 
responses and accurate localization of 
pinprick applied on either side of the 
body. 

From our current neurological con- 
cepts we might assume that after 
hemispherectomy the ipsilateral thal- 
amus integrates the response to a 
nociceptive stimulus on the contra- 
lateral side. This is by no means the 
case. The hemispherectomized chim- 
panzee shows practically complete 
degeneration of all the ipsilateral 
thalamic nuclei which project to the 
cortex (Walker, 1943). There is no 
reason to suspect that the same retro- 
grade thalamic degeneration does not 
occur in man. Apparently, the re- 
maining cerebral hemisphere and the 
thalamic nuclei on the same side are 
sufficient to integrate the response to 
nociceptive stimuli on either side of 


the body. 

The above studies appear to indi- 
cate the following: 

1. Noxious stimulation in the periphery 
alters electrical activity in many cortical 
areas. 

2. Referred “pain sensations” are at times 
evoked by electrical stimulation or pathologi- 
cal processes at any level of the neuraxis, 
including the cerebral cortex. 

_3. In rare instances, localized cortical le- 
sions abolish “the sensation of pain,” i.e., the 
ability to discriminate a noxious stimulus and 
to differentiate it from other stimuli, without 
affecting other components of the pain re- 
sponse. Also, in rare instances, localized cor- 
tical lesions abolish the avoidance movements 
which normally follow noxious stimulation. 

4, Removal of either the right or left cer- 
ebral hemisphere either does not seriously 
affect the response to nociceptive stimuli or 
alters only the response to relatively non- 
intense stimuli applied to the contralateral 
side. 

5. Although the decorticate animal shows 
some components of the pain response—€.§., 
running movements and increased respiratory 
and vasomotor activity—, the intact organ- 
ism probably utilizes a variety of cortical 
neuronal mechanisms when carrying out the 
total response to a noxious stimulus. This is 
further exemplified in the following discussion 


on the mitigation of the discomfort-suffering 
component of the pain response by prefrontal 
leucotomy. 


“RELIEF OF PAIN” BY PREFRONTAL 
LEUCOTOMY 


During recent years an extensive 
group of patients has undergone pre- 
frontal leucotomy (or “lobotomy”) 
for the relief of severe, intractable 
pain syndromes such as causalgia, 
postherpetic neuralgia, metastatic 
carcinoma, thalamic syndrome, etc. 
(e.g., Van Wagenen, cited by Walker, 
1943; Dynes & Poppen, 1949; Free- 
man & Watts, 1950). Although some 
patients died soon after the operation 
and others were not “relieved of 
pain,” others were “relieved” (at 
least for an extended time period) 
and further analysis of this effect may 
give usan increased understanding of 
the pain phenomenon. 

First of all, it is necessary to point 
out that intractable pain has been 
alleviated in some patients not only 
by bilateral frontal leucotomy, which 
supposedly destroys the thalamo- 
frontal projections, but also by uni- 
lateral frontal leucotomy; by bilat- 
eral lower quadrant frontal leucot- 
omy; by topectomy (i.e., by remov- 
ing limited areas, such as Brodmann’s 
Areas 9 and 10, from the frontal 
lobes); and by a number of other 
operations on the frontal areas which 
have been summarized by Sargant 
and Slater (1954). Secondly, the 
“pain relief” which may follow these 
operations does not appear to be re- 
lated to the specific prefrontal areas 


affected; on the contrary, the degree 
o bea nonspecific 


d to the extent of 
ge (Petrie, 1951; 


Hardy, olff, & Goodell, 1952; 
Petrie, ae E 
Slater, 1958). y 

It must be further emphasized that 
only some patients have been helped 
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by these procedures. Walker (1950) 
estimates that at least one third of 
the patients receiving these opera- 
tions have not had any “pain relief. 
A representative report is Hardy, 
Wolff, and Goodell’s (1952) analysis 
of 38 prefrontal leucotomies (25 uni- 
lateral and 13 bilateral) performed 
by Dr. Bronson Ray at the New York 
Hospital for the relief of pain syn- 
dromes related to metastatic cancer, 
Hodgkin’s disease, radiculitis, tabes, 
etc. Of the 25 patients receiving uni- 
lateral leucotomy, 10 were relieved of 
pain and 15 showed no alteration in 
their pain syndrome, Of the 13 pa- 
tients receiving the bilateral opera- 
tion, 11 were relieved and 2 were not 
helped. The term relief of pain, as 
used by these investigators, implies 
that when the Patient was directly 


A further 
sized: post 


In a postmortem Study of 15 pa- 
tients who had undergone transorbita] 
lobotomy for pain of malignant dis- 
ease, Freeman and Williams (1951) 
found that 3 cases were characterized 
by massive hemorrhage, 2 cases failed 
to involve the thalamofrontal pro- 
jections, and the other cases appar- 
ently showed destruction of the thal- 
amofrontal radiations with retro- 
grade degeneration of the dorso- 
medial nucleus of the thalamus. 
Meyer and Beck (1945) also report 


from postmortem studies that the 
Prefrontal lobe is at times entirely 
untouched and that severance of the 
thalamofrontal fibers is often incom- 
plete. 7 
When prefrontal leucotomy allevi- 
ates intractable pain it does not nec- 
essarily elevate the pain threshold or 
alter “the sensation of pain.” Chap- 
man, Solomon, and Rose (1950) 
found a lowering of the pain thresh- 
old immediately after the bilateral 
operation followed by a return to pre- 
operative levels after an intervening 
time period. Hardy et al. (1952) re- 
ported that the pain threshold in 10 
postleucotomy patients, who were 
relieved of their pain syndrome, 
showed no significant difference from 
the preoperative level. King, Clau- 
sen, and Scarff (1950) noted a slight 
lowering or no change in the pain 
threshold after successful unilateral 
leucotomy for intractable pain. Also, 
with few, if any, exceptions, investi- 
gators report that the “sensation” or 
Perception” of pain is practically 
unaltered by any of these procedures: 
€g., “Prefrontal lobotomy changes 
the attitude of the individual toward 
1S pain, but does not alter the per- 


The evidence available at present 
also indicates that, if and when pre- 
ucotomy relieves a pain 
Syndrome, the relief is secondary to 
@ More generalized effect of the op- 
eration which, at times, can be con- 
Ceptualized as apathy, i.e., as a de- 
creased responsiveness to all stimuli 
~including nociceptive stimuli. 
Hardy et al. (1952, p. 317) empha- 
size that postleucotomy patients who 
were either partially or totally re- 
lieved 9 pain “exhibited in many 
ways ...a flattened affect if not 
actual apathy, . _ They failed not 

c in of their spontane- 
ous pain but also of their needs, such 
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as personal nursing care, need of urine 
bottle, bedpan, or the adjustment of 
an uncomfortable dressing. When 
incontinent of feces they were indif- 
ferent to the odor it spread about 
their persons and beds.” Bonner, 
Cobb, Sweet, and White (1952) have 
also emphasized that apathy charac- 
terized their patients immediately 
following bilateral lower quadrant 
frontal leucotomy. Although the 
apathy tended to lessen with the pas- 
sage of time, it was still a character- 
istic feature in patients followed up 
to 36 months postoperatively. 
Although many patients who are 
relieved of intractable pain by pre- 
frontal operations do not show the 
extreme apathy described above, the 
evidence indicates that all patients 
who are helped by these operations 
show a characteristic personality al- 
teration (Krayenbihl & Stoll, 1950; 
Petrie, 1952; Petit-Dutaillis, Mes- 
simy, & Berges, 1953; Elithorn, 
Glithero, & Slater, 1958); and that 
patients who are not helped and pa- 
tients who have undergone other op- 
erations which do not mitigate in- 
tractable pain, e.g., temporal lobot- 
omy, cingulectomy, and orbital un- 
dercutting, do not show the same 
change in personality (Petrie, 1958). 
This characteristic alteration has re- 
ceived a wide variety of formulations: 
Dynes and Poppen (1949) concep- 
tualize it as a decrease in “worry” 
and “concern”; Le Beau (1950) terms 
it the relief of ‘anxiety’; Elithorn et 
al. (1958) formulate it as an “im- 
paired ability to elaborate a persist- 
ing attitude or mood.” These formu- 
lations are not necessarily in basic 
disagreement; they appear to be re- 
ferring to a common behavioral ma- 
trix, viz., to a mitigated “readiness to 
respond” to external and internal 
stimuli, Summarizing the investiga- 
tions in this area, Walsh (1957, p- 
474) writes; “The patient suffering 


from pain complains less of his dis- 
comfort than before. Not. . . a fail- 
ure to appreciate the situation but a 
failure to respond to it. . . . This fail- 
ure to react is seen when stimuli that 
arise within the body itself are con- 
sidered; but there may also be a dim- 
ished response to external situa- 
tions.” 

It should be emphasized that the 
leucotomized patient is able to re- 
spond normally to nociceptive stimu- 
lation. Hardy et al. (1952, p. 316) 
have reported that “some patients, 
although ostensibly tranquil before 
being asked about their pain, over- 
reacted with a show of grimacing and 
fears when their attention was focused 
upon it by a direct question concern- 
ing its quality and its intensity” (em- 
phasis added). The same theme is re- 
peated by other investigators; for ex- 
ample, Hawkes and Gotten (1948, p. 
209) report that “when questioned 
[the leucotomized patients] all indi- 
cated that they realized some pain 
was present when they thought about 
it? (emphasis added). Apparently, 
when the leucotomized patient is di- 
rectly asked to report on his pain, he 
“focuses his attention” on and “thinks 
about” the ever-present nociceptive 
stimulus in his body and, when thus 
reacting to it, often shows discomfort 
or suffering and almost always re- 
ports a “sensation of pain.” How- 
ever, when the patient is not directly 
asked to report on the noxious tissue 
condition, he does not “attend” to it 
or “think” about it to the same extent 
as before the operation and, when 
not thus reacting to it, does not ap- 
pear to be “in pain,” i.e., does not 
show discomfort. ? 

Apparently, discomfort and suffer- 
ing can be minimized or eliminated 
by preventing a “secondary reaction 
to the noxious stimulus. Neurosur- 
geons have used somewhat different 
terms to describe this effect: Freeman 


440 


(1949, p. 18) writes after extensive 
experience with prefrontal operations 
that ‘‘when the emotion is done away 
with, the pain either becomes no 
longer significant or actually disap- 
pears”; Otenasek (1948, p. 234) sim- 
ilarly suggests that “when the fear 
of pain is abolished, the perception of 
pain is not intolerable.” 


Neurophysiological Correlates of Post- 
leucotomy “Pain Relief” 


Since prefrontal leucotomy miti- 
gates the discomfort-suffering com- 
ponent of the pain response in some 
patients and fails to do so in others, 
since this effect is often temporary, 
and since we are rarely certain in any 
one such operation which fiber tracts 
were destroyed, to what extent scar 
formation and vascular damage oc- 
curred, and to what extent the opera- 
tion resulted in Physiological dis- 
turbances in other cerebral areas, it is 
extremely difficult to formulate any 
hypothesis concerning the ‘ 
lief” which m 
Also, as Koskoff, Dennis, Lazovik, 
and Wheeler (1948) have Pointed out: 


pathways, despite the ey 
degeneration following 
Preservation of the respon 
lation noted in such pati 
that the interruption of s 
tracts is not responsible fı 
ing (p. 740). 
Nevertheless, a number 
gators have proposed a 
mechanisms which may 
or indirectly related to Postleuco- 
tomy “pain relief.” Starzl and Whit- 
lock (1952) have presented evidence 
that the “diffuse” thalamic projec- 
tion system, which exerts a general 
cortical “arousal” effect, projects 
primarily, but not exclusively, to the 
frontal cortex in monkey. From this 


of investi- 
variety of 
be directly 
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evidence they hypothesize that “pain 
relief” following leucotomy is due to 
the destruction of the afferent fibers 
from this system. 
127) has suggested that frontal leu- 
cotomy relieves pain by removing 
“large numbers of visceral pain pro- 
jections from the sphere of conscious- 
ness,” specifically, by destroying 
visceral afferent pathways to the 
orbitofrontal cortex. However, as 
White and Sweet (1955, p. 64) point 
out, “We know of no evidence... 
that stimulus to the central end of 
any visceral nerve carrying many 
herve fibers, such as the great 
splanchnic nerve, will cause synchro- 
nous bursts of change of potential 
within this part of the brain in mam- 
mals.” Also, Fulton’s suggestion does 
not explain why the leucotomized pa- 
tient appears to have a diminished 
responsiveness to many other stim- 


uli besides noxious stimuli nor does | 


it explain why the patient may state, 
when directly asked, that the “pain 
feeling” is the “same” but does not 
matter any more, : 

The above investigators have em- 
Phasized the destruction of the af- 
ferent Projections to the frontal areas 
and have neglected the probable ex- 
tensive destruction of corticofugal 
fibers. Ing Postmortem investigation 
of six lobotomized patients, Yakov- 
lev, Hamlin, and Sweet (1950, p- 
$28) found that the frontopontine 
tracts were bilaterally and symetrl- 
cally degenerated and that “the 
great frontal corticofugal pathway 
descending from the entire anterior 
Pre-Rolandic half of the cerebral 
hemisphere was deprived of a large 
and important component.” They 
conclude with a statement that can- 
not be too much overemphasized: 


_ On the basis of this study it seems to us that 
in the attempts made thus far to correlate the 


Fulton (1951, p. ` 
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behavioral changes following frontal lobotomy 
with anatomy... the degeneration of. an- 
terior thalamic radiations and nuclei has been 
stressed to the exclusion of the obvious de- 
generation of the far greater mass of efferent 
projections which connect frontal lobes to all 
the levels of the neuraxis . . . (p. 328). 

In line with this suggestion, a few 
workers have attempted to formu- 
late the effects of leucotomy in terms 
of the destruction of efferent projec- 
tions. Bonner et al. (1952) speculate 
that if the connections between neo- 
cortex and archicortex are severed by 
prefrontal leucotomy “there would 
be less activation of the archicortical 
circuits which probably subserve 
emotional reactions and thus per- 
petuate suffering.” Arnold (1955, p. 
154) hypothesizes that since, the 
dorsomedial nucleus of the thalamus 
degenerates after prefrontal leucot- 
omy and since the frontal lobes ac- 
tivate the sympathetic centers in the 
posterior hypothalamus by connec- 
tions through this nucleus, “Anxiety 
is reduced because the excitation of 
sympathetic effectors is now pre- 
vented and with it a prolongation 
and intensification of the emotion.” 
However, since sympathectomized 
animals (Cannon, Newton, Bright, 
Menkin, & Moore, 1929) and sym- 
pathectomized human patients (Ray 
& Console, 1949; Grimson, Orgain, 
Anderson, Broome, & Longino, 
1949) apparently respond with nor- 
mal “emotion” and “anxiety” to 
many stimuli, it is doubtful that the 
prevention of sympathetic excitation 
is the mechanism involved in the re- 
duced “anxiety” or diminished reac- 
tivity of the leucotomized patient. 

In summary, although a number of 
neurophysiological mechanisms are 
apparently nonfunctional after pre- 
frontal leucotomy, we cannot state 
with any degree of certainty which of 
these mechanisms are necessary for 
the maintenance of intractable pain. 


Nor are we certain that destruction 
of one or more specific nuclei or nerve 
pathways is closely correlated with 
the effects of this operation. In fact, 
the evidence at present suggests that 
prefrontal leucotomy has different 
effects on different patients even 
when apparently similar neural tissue 
is destroyed. To account for these 
differential effects we must have not 
only (a) much more preoperative 
data on each patient (e.g., the pa- 
tient’s personality characteristics, his 
general level of reactivity, the dura- 
tion of his pain syndrome) but also 
(b) much more specific postoperative 
data such as the specific tracts de- 
stroyed and the extent of postopera- 
tive hemorrhage, and (c) a better 
understanding of a number of phe- 
nomena which at present are not well 
understood, such as the “reintegra- 
tion” of function which may occur 
after destruction of neural tissue, the 
specific functions of the afferent and 
efferent fibers from the frontal areas, 
etc. When such specific data are 
available, we may be able to account 
both for the patient who is not helped 
by this operation and for the pa- 
tient who is not only relieved of pain 
but also of worry and concern about 
many situations including, in many 
cases, forthcoming death. 


Tur PROBLEM OF CONGENITAL 
INSENSITIVITY TO PAIN 


A theory of pain must account for 
the “normal”? response to noxious 
stimulation, for the alterations in 
this response by analgesics, place- 
bos, hypnosis, neurosurgical and other 
procedures, and for the antithesis of 
“normal” pain responsiveness, i.e., 
the problem of “congenital insensi- 
tivity to pain.” At the present time 
we are far from a complete under- 
standing of the latter phenomenon. 
Nevertheless, within the limits of the 
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idence as it now stands, certain sig- 
AE factors stand out that should 
hasized. A 
T ae (1955) and Critchley 
(1956) have recently reviewed the 
handful (ca. 16) of well-documented 
cases of “congenital insensitivity, 
McMurray’s (1950) case can be 
briefly summarized to indicate the 
more or less typical findings in these 
patients: 
A 22-year-old female college stu- 
dent, IQ 128, with no apparent per- 
sonality disorders. A history of con- 
sistent lack of pain responsiveness 
dating at least since early childhood. 
Extensive burns, frostbite, deep cuts, 
and other serious tissue damage “had 
gone unnoticed or been looked on in- 
differently.” Her medical history in- 
cluded the incision of a large abscess 
over the occipital bone, osteomyelitis 
of the right calcaneus and of the left 
femur, tonsillectomy and adenoidec- 
tomy, and acute pyelitis, with no 
complaints of pain or tenderness, 
hen subjected in the laboratory to 
such noxious stimuli as cold water 
at a temperature of 0° to 2° C., hot 
water at 49° 


shock from an inductorium, she did 


rate, 
tensive neurological ex 
not reveal any evide 
neurological disease, 

Although the other reported cases 
generally follow the aboye Pattern, 
there are some differences: 2 Patients 
(Ford & Wilkins, 1938, Case 2; 
Kunkle & Chapman, 1943) showed 
epileptic tendencies; 3 patients 
(Kunkle & Chapman, 1943; Arbuse, 
Cantor, & Barenberg, 1949; Cohen, 
Kipnis, Kunkle, & Kubzansky, 1955) 
showed increased diastolic and sys- 


amination did 
nce of organic 


THEODORE X. BARBER 


tolic blood pressure and increased 
heart rate when their hands were 
immersed in water at 0° C.; and at 
least 4 patients were of borderline 
normal or below normal intelligence 
(Ford & Wilkins, 1938, Case 2; 
Arbuse et al., 1949; Farquhar & Sut- 
ton, 1951; Madonick, 1954). 

In many of the reported cases, the 
insensitivity to pain is not an all-or- 
none phenomenon. Many, and possi- 
bly all, of these Ss have at some time 
in their life responded in the normal 
manner to some noxious stimuli. 
Three of Jewesbury’s (1951) cases 
illustrate this exceptionally well. His 
first case showed no response to pin- 
prick or to laboratory pain tests such 
as muscle ischemia and histamine in- 
jection, he was able to Pick up glow- 
ing coals without discomfort, and he 
had teeth drilled and extracted with- 
out any report of pain; however, the 
patient did, at times, show the nor- 
mal response to noxious stimulation, 
for example, when he had smashed 

is fingernail and when he had been 
kicked on the testicles. Jewesbury’s 
Second and third cases also did not 
show pain responses in the labora- 
tory tests of muscle ischemia, hista- 
mine injection, and electric shock at 
40 milliamperes (the maximum avail- 
able); however, the second patient 
had frequent frontal headaches and 
showed normal Pain responsiveness 
during appendicitis and pyelitis and 
the third patient had reported pain | 
from headaches and from retention of 
urine due to an enlarged prostate. 
Similarly, Kunkle and Chapman’s 
(1943) patient had complained of 
“moderate toothache”; Rose’s (1953) 
Patient complained of pain after a 
Vascular accident in his right leg; 
Madonick's (1954) patient and two of 
Ford and Wilkins’ (1938) patients 
had “abdominal pain”; the patient 
of Cohen’s et al, (1955) had reported 
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a “throbbing headache” after spinal 
anesthesia; and Jéquier and Deller’s 
(1956) patient reported ‘‘a little 
pain” when stimulated with a very 
hot object. 

As Critchley (1956) has noted, 
these Ss are not actually “‘insensi- 
tive” to noxious stimulation; they 
can detect, identify, and localize 
noxious stimuli and can easily dif- 
ferentiate them from other stimuli. 
McMurray’s (1950) S states that, 
when a hypodermic needle is inserted 
into her skin, she feels it penetrating 
the tissue layers but does not “‘feel 
pain.” Stimuli such as pinprick and 
cutaneous shock and heat produce 
the report of a pricking or sharp qual- 
ity, but she does not describe this 
quality as “painful.” In fact, since 
this § can discriminate the sharp 
quality of heat stimulation, McMur- 
ray was able to establish in the pa- 
tient a “threshold” close to the nor- 
mal heat pain threshold. Similarly, 
Ford and Wilkins (1938), Kunkle 
and Chapman (1943), Boyd and Nie 
(1949), Jewesbury (1951), Westlake 
(1952), and Jéquier and Deller (1956) 
have reported that their Ss had no 
difficulty differentiating and localiz- 
ing a nociceptive stimulus; they 
could, for example, easily discrim- 
inate between the blunt and pointed 
end of a pin and had no difficulty 
localizing the pinprick. 

The available evidence indicates 
that many, if not all, of these Ss have 
normal peripheral neural apparatus. 
Biopsy specimens from McMurray’s 
patient showed “nerve fibers and free 
nerve endings present... - No mor- 
phological features that would distin- 
guish them from the pain endings of 
normal subjects” (Feindel, 1953, p. 
402). Other investigators who at- 
tempted histological studies (Girard, 
Devic, & Garin, 1953; Madonick, 
1954; Cohen et al., 1955) also found 


nerve fibers in apparently normal 
pattern. 

In many, if not all, of these cases, 
the evidence indicates that no dis- 
tinct localized damage exists in the 
central nervous system. Investi- 
gators who performed extensive neur- 
ological examination of their Ss 
(Boyd & Nie, 1949; Arbuse et al., 
1949; McMurray, 1950; Jewesbury, 
1951; Rose, 1953; Madonick, 1954; 
Jéquier & Deller, 1956) report that 
all tests were essentially normal— 
normal reflexes, normal skull and 
spine X ray, normal pneumoenceph- 
alogram, normal electroencephalo- 
gram, etc. Arbuse et al. (1949) have 
emphasized that there is no indica- 
tion in their case, or in any other re- 
ported case, of a lesion in any specific 
part of the brain. Most investigators 
who have examined these Ss appear 
to be in agreement with De Jong’s 
(1949, p. 411) conclusion that the de- 
fective reaction is more likely due to 
a “generalized or diffuse develop- 
mental anomaly” and that it is 
highly doubtful that any “local le- 
sions” exist. 

In at least three of the reported 
cases the pain insensitivity was not 
due to an irreversible “anomaly.” 
Ford and Wilkins’ (1938) first case 
appeared to be insensitive to pain 
and readily submitted to many seri- 
ous noxious stimuli in the laboratory 
without signs of discomfort; later, 
however, he seemed to be afraid of 
“getting hurt,” refused to have 2 
tooth extracted without an anes- 
thetic, and generally appeared to be 
becoming more concerned about po- 
tentially pain-producing stimuli. 
Similarly, during the first 24 years of, 
life, Jewesbury’s (1951) fourth case 
did not show any signs of pain re- 
sponsiveness to @ wide variety of in- 
juries—serious burns, bruises, bleed- 
ing fingers, etc. At two years of age 


444 


he had been reported in the press aa 
“the child who knows no par 
However, when examined at 32 years 
of age he showed normal pain re- 
sponses to all nociceptive stimuli. 
Rose’s (1953) case also followed a 
similar pattern; Rose reports that 
“his sensitivity to pain is becoming 
progressively nearer the normal and 
he now feels the minor injuries of 
boy’s life as well as any other child.” 
Since each of the reported cases ap- 
pears to differ in some way from 
every other reported case, we cannot 
generalize from the above data. How- 
ever, we are probably safe in tenta- 
tively concluding that some of these 
Ss are able to respond to at least some 
nociceptive stimuli in the normal 
manner, i.e., with the “sensation of 
pain,” discomfort, and alterations in 
some physiological functions, even 
though they almost always fail to do 
so. Also, many, if not all, of these Ss 
can discriminate and localize noxious 
stimuli and easily differentiate these 
stimuli from heat, warm, pressure, 
and touch stimuli. But this “sens- 
ing” of a noxious stimulus is not 
“painful” ; very rarely is it associated 
with unpleasantness or discomfort, 
As Critchley (1956, P. 742) has 
pointed out: “The most remarkable 
feature in this syndrome is a typical 
lack of conformity between the feel- 
ing of pain as a discriminative quality 
of sensation, and the registration of 
distress, either overtly or automati- 
cally.” In fact, the available evi- 
dence suggests that some, if not many, 
of these Ss resemble in their pain re- 
sponsiveness the hypnotic “anal- 
gesic” S and the restricted and iso- 
lated animals studied by Melzack 
and Scott (1957) more than they re- 
semble those rare patients with le- 
sions of the afferent apparatus who 
are unable to discriminate a nocicep- 
tive stimulus. The former phenom- 
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ena, summarized below, also indicate 
that “pain” in the sense of discom- 
fort and suffering is not necessarily 
present when noxious stimuli are dis- 
criminated, differentiated, and local- 
ized. 


THE EFFECT oF EARLY ISOLATION 
ON THE PAIN RESPONSE IN 
THE ADULT 


Melzack and Scott (1957) have 
provided much needed data concern- 
ing the effect of early isolation on 
pain responsiveness in the mature or- 
ganism. These investigators reared 
10 dogs in isolation from puppyhood 
to maturity in special cages which 
drastically limited both their over-all 
experience and their specific experi- 
ence with nociceptive stimuli. Com- 
paring the behavior of these re- 
stricted dogs with the behavior of 12 
normally reared dogs, they report the 
following: 

(a) In general, the 10 restricted 
dogs failed to show adaptive and in- 
telligent responses to noxious stimuli. 
Many of the dogs made no attempt to 
avoid a pinprick, a flame, or an elec- 
tric shock stimulus, Although some 
of the restricted dogs did learn to 
avoid these stimuli, they required 
many more trials than the control 
animals. As long as two years after 
release from isolation, many of the 
restricted dogs continued to show 
maladaptive behavior when given 
noxious stimuli. The investigators 
conclude that “it appears that the 
requisite experience must come at the 
Correct time in the young organism’s 
life. During later stages of develop- 
ment, the experience necessary fO" 
adaptive, well-organized responses tO 
Pain may never be properly ac- 
quired” (p. 159), 

(b) The restricted animals ap- 
peared to be unable to localize the 
source of the noxious stimulus. Not 
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only were the stimuli “not ‘per- 
ceived’ as coming from the experi- 
menter” but the dogs also appeared 
to be “unaware that they were being 
stimulated by something in the en- 
vironment” (p. 158). 

(c) Although the restricted ani- 
mals may have “felt” the nociceptive 
stimuli “in some way,” they rarely 
showed discomfort or suffering: 


. Their reflexive jerks and movements dur- 
ing pinprick and contact with fire suggest 
that they may have “‘felt something” during 
stimulation; but the lack of any observable 
emotional disturbance apart from these reflex 
movements in at least 4 of the dogs following 
pinprick and in 7 of them after nose-burning 
indicates that the perception of the event was 
highly abnormal in comparison with the be- 
havior of the normally reared control dogs. . - - 
The results suggest that the restricted dogs 
lacked awareness of a necessary aspect of nor- 
mal pain perception; the “meaning” of physi- 
cal damage or at least threat to the physical 
well-being (p. 159). 


Additional investigations are 
needed to determine the validity of 
the following hypothesis suggested by 
this study: some components of the 
normal pain response (local reflex 
movements and “the sensation of 
pain”) do not require prior experi- 
ence with noxious stimuli; other com- 
ponents of the pain response (localiz- 
ing the stimulus, purposive with- 
drawal movements, and discomfort- 
suffering) require previous experience 
with such stimuli. 


HYPNOTICALLY-INDUCED 
“ANALGESIA” 


The experimental evidence, sum- 
marized by Weitzenhoffer (1953), in- 
dicates that, when given appropriate 
suggestions to induce “analgesia,” 
some “good” hypnotic Ss do not show 
a pain response to some noxious stim- 
uli, that is, they do not give a verbal 
report of pain, they do not withdraw 
from the stimulus, they do not show 
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discomfort by wincing, tremor, or 
restlessness, and they do not show 
significant alterations in blood pres- 
sure, heart rate, pulse rate, or res- 
piration. Dynes (1932) reported that 
following pinprick during hypnoti- 
cally-induced “anesthesia” seven 
“trained” hypnotic Ss denied that 
the stimuli were painful, did not show 
withdrawal or facial flinch, and 
showed little or no disturbance in the 
normal rate and rhythm of respira- 


tory and cardiac activity. Subse- 
quently, Dynes’ Ss were asked (by 
someone other than the experi- 


menter) to “fake a trance” during the 
following experiment but not to 
“enter hypnosis.” In this situation, 
pretending they were in trance, they 
showed all of the normal responses 
to the nociceptive stimuli. In a sim- 
ilar study, Sears (1932) recorded the 
responses of seven “good” hypnotic 
Ss to a sharp steel point pressed 
against the leg for 1 sec. with a pres- 
sure of 20 oz. Suggestions of anal- 
gesia were given for the left leg and 
the right leg was employed as a con- 
trol. When the stimulus was applied 
to the “analgesic” left leg, the Ss did 
not show facial flinch or variations in 
respiration and the increased pulse 
rate, which normally follows nocicep- 
tive stimulation, was significantly 
decreased. However, they did show 
these responses when the stimulus 
was applied to the “normal” right leg. 
In further control experiments, when 
the Ss were asked to inhibit all reac- 
tions to the noxious stimulation, all 
Ss showed alterations in pulse and 
respiration. Doupe, Miller, and Kel- 
ler (1939) have in general confirme 

these findings, reporting, that their 
hypnotic Ss showed a slight altera- 
tion in respiratory rhythm, no Sig- 
nificant change in pulse rate, and no 
facial grimace when multiple pin- 


pricks were applied to the ‘‘anal- 
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gesic” arm. Brown and Vogel (1938) 
also found that three Ss showed less 
variability in blood pressure, pulse, 
and respiration when nociceptive 
stimuli (lancet, thumb tack, and 
water at 49° C.) were applied to the 
“anesthetic” limb than when the 
same stimuli were applied to the 
“normal control” limb, Although 
they conclude that “physiological 
reactions to moderate and mild sen- 
sory stimuli may be affected by sug- 
gestion in the hypnotic state and by 
imagination in the waking state,” it 
is not clear from their report to what 
degree these responses were affected. 
Although the experimental studies 
generally report either a complete 
lack or a significant decrease in vaso- 
motor and respiratory alterations fol- 
lowing nociceptive stimulation dur- 
ing hypnotically-induced “‘anal- 
gesia,” they report completely con- 
tradictory results with the galvanic 
skin response (GSR). Some investi- 
gators (Georgi, 1921) found that the 
GSR to noxious stimulation was com- 
pletely eliminated during hypnotic 
“anesthesia”; others (West, Niell, & 
Hardy, 1952) concluded that it is at 
times significantly decreased over the 
normal and at other times completely 
eliminated; and still others (Levine, 
1930; Barber & Coules, 1959) re- 
ported that the GSR to noxious 
stimuli is not significantly altered 
after hypnotic suggestions of anal- 
gesia. However, an extensive group 
of investigations indicate that the 
GSR is the least specific of all the 
physiological responses which may 


5 Doupe et al. also found that, in compari- 
son with the normal limb, the hypnotically 
“anesthetic” limb showed a reduced vaso- 
constrictor response to pinprick, They are 
uncertain, however, whether this “residual 
response” is “of the nature of a spinal reflex 
or due to ‘“‘sub-conscious or co-conscious ac- 
tivities” on the part of the S. 


follow noxious stimulation. Although 
blood pressure alterations, for ex- 
ample, are at times present when S 
is responding to nonpainful stimuli, 
some variation in blood pressure ap- 
pears to be always present when S 
does “feel pain” (Nafe & Wagoner, 
1938; Goetzl, Bien, & Lu, 1951). This 
is not true of the GSR, however. An 
S may show a GSR when he does mot 
“feel pain” and he may not show a 
GSR when he does “feel pain. 
Brown and Vogel (1938) demon- 
strated that hypnotic Ss often showed 
a GSR when there was no doubt that 
they did not “feel pain,” that is, when 
noxious stimuli were applied to an 
area made insensitive by novocain 
block. They write that “light appli- 
cation of the pin point [to the area in 
which novocain had been injected] 
++. appreciated as touch, caused 
large galvanometer deflections” (p. 
419). Along similar lines, Levine 
(1930) and Barber and Coules (1959), 
using hypnotic Ss, and Sattler (1943), 
using nonhypnotic Ss, found that 
Ss often show a GSR when they are 
told they are to be given a pain- 
ful stimulus but are not given the 
stimulus. West et al. (1952) found 
that (a) the GSR showed a significant 
decrease over the control levels for 
all seven of their hypnotic Ss even 
when “there was no alteration in 
Pain perception, according to subjec- 
tive reports,” and (b) during the con- 
trol periods a stimulus “evoking a 
pain of 6 or 7 dols” at times failed to 
Produce a GSR. After a careful, 
long-term investigation designed to 
determine the relationship of the 
GSR to the pain response, Furer an! 
Hardy (1950) concluded that the 
GS is directly related to the 
““threat-content” of the stimulus an 
is not related to the “sensation of 
pain” as such. Following Furer and 
Hardy’s interpretation, we can con- 
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clude that studies which have re- 
corded the GSR during hypnotically- 
induced “analgesia” indicate that to 
some hypnotic “analgesic” Ss the 
noxious stimuli are “threatening,” 
to others they are less “threatening” 
than during the control period, and 
to still others they are not ‘‘threaten- 
ing” at all. 

However, we cannot draw any con- 
clusions from the above studies as 
to the effectiveness of hypnotic pro- 
cedures when the stimuli are more 
severe and of longer duration. For 
this type of report we must turn to 
the clinical investigations. In gen- 
eral, the clinical reports suggest that 
hypnotic methods, with some patienis, 
may be as effective as morphine and 
other opiates in minimizing patho- 
logical pain syndromes and in miti- 
gating or totally eliminating the dis- 
comfort-suffering component of the 
pain response during a variety of 
surgical procedures. A typical report 
of the surgical use of hypnotic tech- 
niques is Mason’s (1955) discussion 
of a case of mammaplasty: during the 
operation, which consisted of exci- 
sion of breast tissue, skin, fat, and 
complete reshaping of the breast, the 
patient ‘‘never showed signs of pain 
or seemed distressed” and the pulse 
and blood pressure showed very lit- 
tle, if any, alteration. Kroger (1957) 
has also reported four cases which 
are more or less typical of the surgical 
findings. The first case, a 20-yr.-old 
female, had ‘‘a fairly large tumor” re- 
moved from the right breast without 
preoperative Or operative medica- 
tion. She showed “no indication of a 
pain reflex at any time” and she was 
“fully aware of the entire surgical 
procedure.” Another patient, who 
underwent Caesarean section and 
hysterectomy with hypnotic ‘“‘anes- 
thesia,” “experienced no subjective 
discomfort and conversed with every- 


body in the operating room. She was 
fully conscious and was able to watch 
the birth of her baby. There was no 
discomfort when the baby was de- 
livered by forceps, or when the uterus 
was extirpated.” Other studies, re- 
cently summarized by Barber 
(1958b), also report that hypnotic 
methods are successful with some pa- 
tients in minimizing Or completely 
eliminating the discomfort-suffering 
component of the pain response dur- 
ing childbirth, terminal cancer, fatal 
burns, dysmenorrhea, and other pain 
syndromes. 

It should be emphasized, however, 
that in the more severe and intracta- 
ble pain syndromes, such as terminal 
cancer and spinal cord injuries, hyp- 
notic methods are reported to mini- 
mise discomfort and suffering ; rarely, 
if ever, are these procedures reported 
to completely eliminate the total pain 
response to the ever-present noxious 
stimulus in the patient's body. Dor- 
cus and Kirkner (1948) found that al- 
though hypnotic methods could min- 
imize discomfort in five cases of spinal 
cord injury—ie., the patients re- 
ported less pain and requested a 
smaller amount of drugs—these 
methods were by no means effective 
in entirely eliminating the pain re- 
sponse. Similarly, Butler (1954) re- 
ported that hypnotic methods were 
effective with some patients in min- 
imizing discomfort during terminal 
cancer—the patients either required 
half of their usual amount of mor- 
phine or, in a few cases, did not re- 
quire any drugs for a period of time. 

The evidence available at present 
indicates that two objections whic 
have been raised concerning the et- 
fectiveness of hypnotic procedures 1n 
“relieving pain” are not valid. Hull 
(1933) was of the opinion that hyp- 
notic Ss may state, after the exper: 
ment, that they did not “feel pain 
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uring the experiment because am 
agen rn suggested and they 
simply do not remember. ; However, 
in a number of recent studies (Rosen, 
1951; Mason, 1955; Kroger, 1957) 
posthypnotic amnesia was not sug- 
gested and the patients continued to 
insist that they had not “felt pain” 
even when they were perfectly able 
to recall the entire procedure. Others 
have objected that hypnotic Ss ac- 
tually “feel pain” but deny it (when 
questioned by the hypnotist) be- 
cause of their rapport or strong 
“transference” relationship with the 
hypnotist. This also seems doubtful. 
Whenever any of the above patients 
were questioned afterwards by dis- 
interested observers, they continued 
to vehemently deny “feeling pain” 
during the procedure (Marcuse & 
Phipps, 1956; Kroger, 1957). 

Before we can state what are the 
necessary and sufficient conditions 
for hypnotic “analgesia,” we need 
more extensive, controlled experi- 
ments utilizing a wide variety of 
noxious stimuli applied to visceral, 
Somatic, and cutaneous structures, 
Recent Investigators, however, have 
emphasized three conditions which 
may be among the necessary condi- 
tions for this phenomenon, 

First of all, it seems that the S must 
be a certain type of person who is 
able to become “deeply hypnotized.” 
With few, if any, exceptions, in 
tors agree that these individuals (us- 
ually termed Somnambulists) are a 
small minority—5 to 25%—of the 

population, at least in our culture 
(Weitzenhoffer, 1953, p. 59; Butler, 
1954; Mason, 1955; Kroger, 1957). 
The limited evidence available at 
present suggests that these indi- 
viduals are characterized by a num- 
ber of distinct “abilities.” Young 
(1928, p. 372) found that one or more 
of the following characteristics 


vesti- 


showed themselves in all of his “best” 
Ss long before they were ‘“‘hypno- 
tized”: “deep abstraction, reverie 
amounting almost to ecstasy, putting 
oneself to sleep at will, actually hyp- 
notizing one’s self.” Similarly, Bar- 
ber (1958b, 1958c) found that all of 
his somnambulistic Ss had been able 
since childhood to go to sleep easily 
and quickly at anytime—day or 
night—and to concentrate on their 
work or studies by “blocking-out 
irrelevant stimuli. 

What appears to be a second neces- 
sary condition for hypnotic ‘anal- 
gesia’’ has been formulated by Leuba 
(1957) as follows: 

“There must be concentration on the ideas 
presented by the hypnotist and with a mini- 


mum of counter or critical thoughts; and a 
belief that what the hypnotist says will hap- 
pen, can actually happen, and will happen. In 
other words, there must be a set or attitude to 


accept the hypnotist’s statements completely 
and uncritically” (p, 37). 


Along similar lines, Kroger (1957, p- 
xi) has concluded from his extensive 
experience with hypnotic procedures 
that “when one wishes to perform 
major surgery under hypnoanesthesia 
...it is very important to get the 
patient to believe in the actuality of 
the trance state.” Recent evidence 
indicates that these statements may 
be valid. Barber (1957b) reported 
that a somnambulistic hypnotic S 
(who quickly and easily carries out 
all of the “complex” hypnotic behav- 
lors such as analgesia, age-regression, 
negative and positive hallucinations, 
etc.) becomes “unhypnotizable 
when he no longer “believes” in hyp- 
nosis, i.e., when he concludes from 
's own reading, or from training Ín 
autohypnosis,? that the hypnotist 
does not possess any special power OF 
ability and that whatever occurs dur- 


™Shor, R. 


E. Personal communication. 
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ing the hypnotic situation is brought 
about primarily by the subject him- 
self, A number of investigators have 
also reported that somnambulistic Ss 
(in the “deepest stage of hypnosis’’) 
do not carry out “complex” hypnotic 
behaviors such as color blindness 
(Erickson, 1939), age-regression 
(Orne, 1951), immoral or dangerous 
behavior (Young, 1952), and nega- 
tive hallucinations (Barber, 1958a) 
when the hypnotist simply gives 
them the appropriate suggestions; 
however, they do carry out such be- 
havior when the hypnotist manipu- 
lates the situation in such a way as to 
lead the Ss to “believe” that the sug- 
gestions are literally true statements. 

A third factor which seems to be 
closely related to the above, has been 
recently emphasized by physicians 
attempting to relieve the pain of 
terminal cancer or childbirth by hyp- 
notic methods; the patient must have 
confidence in his physician-hypnotist 
and the hypnotist must “give of him- 
self” to the patient. In treating the 
pain of terminal cancer by hypnotic 
procedures, Butler (1954) saw his pa- 
tients at least daily and often two to 
four times a day. Whenever hypno- 
therapy was terminated for any 
length of time, the patients all showed 
a return of the original pain syn- 
drome. However, in the few cases 
when hypnotic procedures were dis- 
continued but the patient received the 
same amount of personal attention 
from the physician, the patients did 
just as well for one or two days as 
they did during “hypnosis.” Butler 
emphasizes that in treating the pain 
of terminal cancer by hypnotic meth- 
ods the hypnotist-physician “gives of 
himself to the patients. ..- Even an 
hour’s treatment with a very sick pa- 
tient can produce an appreciable tir- 
ing of the hypnologist, and, as the 
sympathetic bond between the two 


grows stronger, the hypnologist may 
even ‘feel’ the symptoms he is trying 
to eradicate from the patient” (p. 6). 
Alongsimilar lines, Winkelstein (1958, 
p. 154) concluded, after using hyp- 
notic methods over a 2-yr. period 
with 200 of his obstetrical patients, 
that “the mental attitude of the pa- 
tient, the patient-obstetrician rap- 
port, and the confidence of the pa- 
tient in the procedure as well as in 
the accoucheur, seemed to be as im- 
portant factors as was the hypno- 
suggestion itself.” 

In summary, the evidence available 
at the present time indicates that 
when the hypnotist properly manipu- 
lates the situation some “good” hyp- 
notic Ss show a mitigated pain re- 
sponse to some noxious stimuli,’ that 
is, (a) they do not show withdrawal 
or avoidance, (b) they report that the 
stimuli are not painful, (c) they do 
not show discomfort and (d) they 
do not show physiological responses 
such as vasomotor and respiratory 
alterations (although they may or 
may not show galvanic skin re- 
sponses). The evidence also suggests 
that Ss who are able to carry out the 
above are “set” to accept the hyp- 
notist’s suggestions as literally true 
statements and have complete confi- 
dence in the hypnotist and in the 
efficacy of hypnotic procedures.’ 


8 [t should be emphasized that the hypnotic 
“analgesic” S, like the leucotomized, nar- 
cotized, or congenitally insensitive patient, is 
able to discriminate, differentiate, and localize 
the noxious stimulus when asked to do so 
(Rosen, 1951). Although he can “‘sense’’ the 
stimulus, it does not arouse discomfort. 

9 Ag will be pointed out below, the “pain 
relief” which at times follows the a 
tion of a placebo js also close! 


"s belief or conviction that the “drug” has 


i i least some 
tive properties.- Apparently, at 1 
ag ope me ness of hypnotic “analgesia” 


n 
is due to a “placebo effect. ; 

: However, the hypnotic “analgesic” S also 
resembles the patient who has received mor- 


— 
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THE INCONSTANT PAIN THREHSOLD 


Recent studies indicate that both 
morphine and placebos can eliminate 
discomfort and suffering without al- 
tering the pain threshold and without 
affecting “the sensation of pain. 
Since many of the studies on placebos 


and analgesic drugs are intimately re-. 


lated with the pain threshold studies, 
we shall first review the latter investi- 
gations. 

When the subject of pain was last 
reviewed in this journal (Edwards, 
1950) it appeared that Hardy, Wolff, 
and Goodell (1940) had established 
that the pain threshold was relatively 
constant in the same S at different 
times. Using what was later to be- 
come known as the Hardy-Wolff- 
Goodell radiant heat technique!” and 
using themselves as Ss, these investi- 
gators reported thatwhen painthresh- 
old measurements were taken almost 
daily for nearly a year, the average 
threshold value was 232 mce./sec./ 
cm.’ with a standard deviation of 
only +9 millicalories. In addition, 
they reported that all observations 
were within +12% of the mean. It 
also appeared that the same workers 
(Schumacher, Goodell, Hardy, & 
Wolff, 1940) had established that a 
large group of untrained Ss h 
proximately the s 
Theyreported th 


phine or other opiates, As Pointed out below 
morphine apparently gives “pain relief” p a 
bringing about “freedom from anxiety” or “3 
bemused state.” These terms also appear ap- 
plicable to the “good” hypnotic § who be- 
comes relatively unconcerned about and 
“relatively inattentive to all stimuli except the 
words of the operator and stimuli to which 
the operator specifically directs his attention” 
(Barber, 1957a). d 
To what extent hypnotic “analgesia” is due 
to each of these seemingly different mecha- 
nisms is a subject for further research. 
10 For a detailed descrption of this method 
see Hardy et al. (1952, pp. 67-85). 


old for 150 untrained Ss was 206 
mc./sec./cm.? with a standard devi- 
ation of only +21 millicalories and a 
range extending only from 173 to 
232 millicalories. Subsequent investi- 
gations, however, have failed to con- 
firm both of the above conclusions; 
it now appears that there is a wide 
variation in pain threshold among a 
group of Ss and that the threshold 
is by no means consistent in the 
same S over time. 

Using the Hardy-Wolff-Goodell ra- 
diant heat technique, Chapman and 
Jones (1944) found that the pain 
thresholds of 200 Ss varied between 
—40 to +50% of the mean and 
Kuhn and Bromiley (1951) reported 
that the pain thresholds of 37 Ss 
ranged from 169 to 296 millicalories 
with a standard deviation of 31.9. 
Hall and Stride (1954), using a modi- 
fied Hardy-Wolff-Goodell technique, 
found that the pain threshold of 400 
psychiatric patients (neurotics, de- 
pressives, and schizophrenics) ex- 
tended “over almost the whole range 
of stimulus intensity” with the mean 
at 260 millicalories and a standard 
deviation of 72 +45. The depressives 
and schizophrenics reported pain at 
a uniformly high level of stimulus 
intensity while the anxiety neurotics 
consistently reported pain at low 
stimulus intensities. Since the pain 
threshold, but not the warmth thresh- 
old, could be easily altered by vary- 
ing the instructions, Hall and Stride 
Suggest that pain threshold varia- 
tions are due to “central attitude or 
Pain-conceptualization and not to 
differences in peripheral sensitivity- 
Five additional studies, using the Har- 
dy-Wolff-Goodell technique, which 
also failed to find consistency in the 
Pain threshold have been recently re- 
viewed by Beecher (1957). 

Other workers using other methods 
have also found wide variability in the 


Es Ț 
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pain threshold. Although Hardy et 
al. (1940) had reported that all 
threshold measurements of their 3 Ss 
(themselves) fell within +12% of 
the mean, Lanier (1943), using an 
electric shock stimulus, found that 
the threshold of 15 college women 
showed a variation around the mean 
of —80 to +300%. He also found 
that some Ss showed a relatively 
constant threshold while others 
showed wide variations in their pain 
threshold at different times. Clark 
and Bindra (1956), using thermal, 
electrical, and mechanical stimuli, 
have demonstrated wide individual 
differences in the pain threshold of 
46 untrained Ss. They attribute these 
variations to “attitudinal” variables 
such as the definition of pain, set, 
anxiety, and timidity. 

After reviewing the many investi- 
gations in this area, Beecher (1957, 
p. 128) writes that “a survey of the 
abundant literature on the subject 
presented above forces one to con- 
clude that the pain threshold is not 
constant from one individual to an- 
other nor even in a given individual 
from one time to another.” Simi- 
larly, Kutscher and Kutscher (1957) 
conclude, after reviewing the litera- 
ture, that the pain threshold varies 
widely among human beings, pro- 
vided that a sufficiently large group 
of Ss is tested. 

The second conclusion that appears 
to have been established by these 
investigations is that the pain thresh- 
old can be easily influenced by vary- 
ing the instructions (Hall & Stride, 
1954), by a wide variety of ‘‘dis- 
tractions,” and by placebos, anal- 
gesics, and hypnosis. Wolff and Good- 
ell (1943) had earlier demonstrated 
that placebos, in some cases, elevated 
the pain threshold as much as 95%, 
that the distraction caused by retain- 
ing and repeating from 5 to 9 digits 
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raised the threshold as much as 45% 
and that “shallow hypnosis” ele- 
vated the pain threshold by 40%. 
Subsequent work on the effects of 
analgesics and placeboson pain thresh- 
old will be discussed in the following 
section of this paper. 

Kutscher and Kutscher (1957) 
have noted that the pain threshold 
can be significantly influenced by 
the operator administering the test. 
A report by Denton and Beecher 
(1949) indicates that this observation 
may be valid. Having failed to find 
any consistent effect of analgesic 
agents on pain threshold in trained 
subjects, these investigators re- 
quested the service of an individual 
who had had wide experience with 
the Hardy-Wolff-Goodell method. 
They found that this operator re- 
ported consistent elevations in the 
threshold, after the administration 
of an analgesic drug, when he knew 
which drug—a placebo or an anal- 
gesic—had been administered; how- 
ever, when he did not know whether 
an effective drug or a placebo had 
been administered, he was unable to 
report any consistent threshold ele- 
vation. 

That the pain threshold can be 
readily influenced by a wide variety 
of factors is not surprising if we stop 
to consider that determination of the 
human pain threshold does not even 
remotely resemble the determination 
of threshold responses of nerve fibers, 
nerve trunks, or other isolated physi- 
ological units; determination of the 
human pain threshold obviously re- 
quires judgment or interpretation on 
the part of the S. The S must inter- 
pret the stimulus in accordance with 
his concept of pain and interpreta- 
tion clearly depends on S's. previous 
life-history and Ta pa aee 

istory in respon ing to similar > 
d muli In fact, the Hardy- 
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olff-Goodell method requires more 
ki that S simply judge when he 
first becomes aware of a stimulus; 
he is required to determine when the 
stimulus first undergoes a qualitative 
change. Operationally, the Hardy- 
Wolff-Goodell “pricking pain thresh- 
old” refers to the S’s judgment that 
the feeling of warmth and heat has 
“swelled” and “drawn together” into 
a “very small” and “barely percepti- 
ble prick” at the “exact end of the 
3-sec. exposure to the stimulus” 
(Hardy et al., 1952, p. 81). This 
“pricking” feeling must be inter- 
preted by S as different not only 
from the warmth and heat which pre- 
cede it but also from the “burning” 
which may be simultaneously pres- 
ent. It would indeed be suprising if 
such an intricate judgment could not 


be influenced by a wide variety of 
factors, 


THE EFFECT Of OPIATES 

ON THE PAIN RESPONSE 
As pointed out above, morphine 
and other opiates give “pain relief” 
without necessarily altering the pain 
threshold. Although Wolff, Hardy, 
and Goodell (1940) reported that the 
pain threshold is consistently ele- 
vated after morphine, subsequent in- 
vestigations failed to confirm this 
conclusion. Andrews (1943), Chap- 
man and Jones (1944), Denton and 
Beecher (1949), Isbell (cited by 
Wikler, 1950), Javert and Hardy 
(1951), and Kuhn and Bromiley 
(1951) found that after an analgesic 
dose of morphine the pain threshold 
may be elevated, may be lowered, 

or may remain unchanged. 

A related line of research com- 
paring placebos and analgesic drugs 
arrived at similar results. Denton and 
Beecher (1949) found, using the 
Hardy-Wolff-Goodell method, that 
a placebo had the same effect on 


pain threshold as an analgesic dose 
of morphine. Similarly, Birren, 
Schapiro, and Miller (1950) reported 
that a placebo (lactose) had the same 
effect on pain threshold as 0.6 gm. 
of acetylsalicylic acid and sodium 
phenobarbital, Isbell (cited by Wik- 
ler, 1950) also found no significant 
difference in the effect on the pain 
threshold when Ss received morphine 
and when they received a placebo 
but were told they were being given 
morphine. Beecher (1957) has re- 
viewed 10 additional investigations 
which also indicate that morphine 
and other opiates (a) do not neces- 
sarily elevate the pain threshold when 
they give “pain relief” and (b) if and 
when they do elevate the pain thresh- 
old, they do so to the same extent 
and possibly in the same way as 
placebos, 

The evidence also indicates that 
Opiates can give “pain relief” with- 
out altering “the awareness of pain” 
or “the pain Sensation.” Cattell 
(1943) has summarized the data in- 
dicating that “awareness of pain” is 
not necessarily altered by narcotics. 

olff et al, (1940, p. 677) have em- 
Phasized that after morphine admin- 
istration “the Pain sensation is per- 
ceived and is recognized as pain with 
no difficulties,” Apparently, “the 
Sensation of pain,” in itself, is not 
necessarily “painful,” “The sensa- 
tion of pain” may be completely un- 
affected by morphine (and placebos, 
hypnosis, Prefrontal leucotomy, etc.) 
and yet discomfort and suffering are 
no longer present. 
he “pain relief,” i.e., the mitiga- 
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ment,” and “a bemused state.” This 
viewpoint is perhaps best epitomized 
by Beecher (1957, p. 152) who writes 
after extensive clinical and experi- 
mental experience with analgesic 
drugs that “perhaps one can con- 
clude that the narcotics really alter 
pain perception very little but do 
produce a bemused state, comparable 
to distraction, which they can be 
alerted out of’ and will then report 
on the little altered pain perception.” 
Along similar lines, Hill, Kornetsky, 
Flanary, and Wikler (1952a) have 
hypothesized that the “pain relief” 
following morphine administration is 
a consequence of a more generalized 
effect which they term relief of 
“anxiety” or “fear of pain.” They 
tested this hypothesis by studying 
the effect of subcutaneous injection 
of 15 mg. of morphine on S’s ability 
to judge the intensity of electric 
shock stimuli under two conditions: 
(a) when Ss were made “anxious” 
by not “familiarizing them with the 
potentially fearinspiring experimental 
situation,” and (b) when “anxiety” 
was allayed by “reassurance, demon- 
stration, and explanation.” They 
reported the following: 

_ (a) Under conditions which promote anx- 
iety or fear of pain, subjects tend to overesti- 
mate the intensities of painful stimuli; (b) 
morphine reduces such anxiety; (c) under 
conditions in which anxiety is largely elimi- 
nated, little if any overestimation of the in- 
tensities of painful stimuli occurs; (d) mor- 
phine does not affect the ability of subjects to 


accurately estimate the intensities of painful 
stimuli when anxiety is dissipated (p. 479). 


Corroborative data were obtained 
in another study by the same group 
of investigators (Hill, Kornetsky, 
Flanary, & Wikler, 1952b). In an 
additional follow-up experiment, us- 
ing thermal stimuli, Kornetsky (1954) 
also confirmed these results and con- 
cluded that morphine appears to be 
effective as an analgesic agent only 
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when “anxiety” is present. 

In summary, the investigations on 
narcotics suggest a similar conclusion 
as the investigations on prefrontal 
leucotomy and hypnosis which were 
summarized in an earlier section of 
this paper: discomfort and suffering 
are not inevitably associated with 
noxious stimulation; they appear to 
be components of a secondary “reac- 
tion to” the stimulus (which has 
been conceptualized as “anxiety” or 
“fear of pain”) which can be mini- 
mized or eliminated by opiates hyp- 
nosis, placebos, prefrontal leucotomy, 
and other procedures. 


THE PLACEBO EFFECT 


The effect of placebos on the pain 
response deserves further comment. 
Jellinek (1946) reported that 60% of 
199 patients with chronic headaches 
received “relief” from a placebo on 
one or more occasions. In extensive 
studies of severe, steady, postopera- 
tive wound pain, Beecher (1955) and 
his collaborators (Lasagna, Mosteller, 
von Felsinger, & Beecher, 1954) found 
that about 35% of their patients re- 
ceived “satisfactory” relief from a 
placebo." (“Satisfactory relief” is 
defined by these workers as “50 per 
cent or more relief of pain at 45 and 
90 minutes after the administration 
of the agent.”) Houde and Wallen- 
stein (1953) and Keats (1956) have 
carried out similar studies and have 
confirmed the findings of the Beecher 
group. . 

How does a placebo relieve chronic 
headache or minimize the suffering 


es not indicate that pla- 


cebos are only 35% as effective as morphine. 
Morphine, in maximum safe dosages, results 
in “satisfactory” postoperative pain relief in 
only 75% of the same group of pi 
(Lasagna & Beecher, 1954). The placebo, 
therefore, is about half as effective as mor- 


hine in the same situation and among the 
same patients. 


This finding do 
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i i toperative 
ssociated with a post 1 
Kae As a first approximation to 
an answer, it seems difficult to dis- 
agree with Wolf’s (1950, pp. 106-108) 
conclusion: 


“placebo” actions depended for 
A eaS the conviction of the patient 
that this or that effect would result.... The 
fact that “placebo effects "occur depends, of 
course, on the generalization established re- 
peatedly by numerous workers that the 
mechanisms of the human body are capable of 
reacting not only to direct physical and chemi- 
cal stimulation but also to symbolic stimuli, 
words and events which have somehow 
aquired special meaning for the individual. 


In general, the above studies and 
the many other studies on the effects 
of placebos on Physiological func- 
tions and in psychotherapeutic situa- 
tions, reviewed by Beecher (1955), 
Rosenthal and Frank (1956), and 
Kurland (1957) indicate that the 
placebo reactor is responding to a 
“drug” which he believes has cura- 
tive properties. This belief appears 
to be a function of many factors: 
what the physician specifically tells 
the patient about the “drug,” the 
patient’s previous experience with 
drugs, his previous experience with 
physicians, his specific experience 
with the physician giving him the 
“drug,” etc. The placebo response 
may be viewed as a direct function 
of “the stimulus”: however, “the 
stimulus” is not the ineffective, inert 
compound but the entire situation 
which includes the “drug,” the words 
of the physician, and the patient’s 

previous experience with physicians 
and drugs. 

Placebo research is still in its in- 
fancy. As Kurland (1957) has pointed 
out, the effect of the placebo js 
usually stated in general terms, the 
duration of reactivity is usually not 
specified, and specific physiological 
measures are rarely reported. Studies 
of the differential effect of placebos 


are also rare. Are some gerne 
more prone to respond to placebos? 
Lasagna et al. (1954) studied 27 post- 
operative patients with the Ror- 
schach, the TAT, the Wechsler-Belle- 
vue Vocabulary Subtest, and a ques- 
tionnaire filled out by the nurses on 
the wards. The 11 consistent placebo 
reactors differed from the 16 patients 
who never received pain relief from 
a placebo in a number of character- 
istics, among which were the follow- 
ing: 

The reactors were more productive of re- 
sponses, more anxious, more self-centered and 
Preoccupied with internal bodily Processes, 
and more emotionally labile, They are indi- 
viduals who seem more dependent on outside 
stimulation than on their own mental proc- 
esses. These processes tend to be less mature 
than in the case of the non-reactors, The 
reactors are in general individuals whose in- 
stinctual needs are greater and whose control 
over the social expression of these needs is 


less strongly defined and developed than in 
the non-reactors . . - (P: 775). 


However, Wolf, Doering, Clark, 
and Hagens (1957) contradict these 
conclusions: finding that intra-indi- 
vidual variations in response to place- 

Os are as great as interindividual 
variations, they conclude that the 
Placebo reactor cannot be predicted 
from a knowledge of the S's char- 
acteristics, An interesting field of 
research has been opened for further 
inquiry. 


Conciustons 


The investigations summarized 
above Suggest the following conclu- 
sions which may be significant for a 
theory of pain: 

1. The generally accepted view, 
that “pain” has its “own” peripheral 
receptors and its “own” pathways 
in the central nervous system, is mis- 
leading, Nociceptive stimuli activate 
various types of nerve fibers which 
travel in more than one pathway 1n 
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the spinal cord and brain stem and 
which project by thalamic and extra- 
thalamic pathways to wide areas of 
the cortex. 

2. The response to a nociceptive 
stimulus is apparently brought about 
when a spatiotemporal pattern of 
neural activity set off by the noxious 
stimulus reaches segmental and supra- 
segmental centers. The pattern of 
neural impulses set off by noxious 
stimuli differs from the neural pat- 
tern set off by other stimuli in that 
the relative number of fibers of differ- 
ent sizes activated differ, and the 
relatively different fibers activated 
carry impulses of different energy 
value, of different frequency, and of 
different duration. 

3. “Pain” in the sense of discom- 
fort and suffering is xot necessarily 
present when noxious stimuli are 
discriminated, differentiated, and lo- 
calized. The few cases which have 
been reported of “congenital insensi- 
tivity to pain” suggest that an indi- 
vidual may be able to “sense” a 
noxious stimulus—i.e., may be able 
to discriminate and localize the stimu- 
lus and differentiate it from other 
stimuli—and yet not show with- 
drawal movements; physiological al- 
terations, or discomfort. Also, dis- 
comfort and suffering can be mini- 


mized or totally eliminated in some 
Ss by placebos, opiates, prefrontal 
leucotomy, and hypnotic procedures 
without necessarily altering the “‘sen- 
sation of pain” or elevating the pain 
threshold. 

4. The mitigation of discomfort- 
suffering by prefrontal leucotomy, 
opiates, and, to some extent, hyp- 
nosis, appears to be secondary to 
a more generalized effect of these 
procedures. Prefrontal leucotomy 
“alleviates worry and concern” and 
“relieves anxiety”; morphine gives 
“freedom from anxiety” and brings 
about “contentment” and “a be- 
mused state”; the hypnotic S is re- 
lieved of pain when he becomes 
“relatively inattentive and uncon- 
cerned about all stimuli to which the 
hypnotist does not specifically direct 
his attention.” These terms appear 
to refer to a common behavioral ma- 
trix: a mitigated “readiness to Te- 
spond” to stimulation. Apparently, 
discomfort and suffering follow noci- 
ceptive stimulation when the S “at- 
tends to” and “reacts to” the stimu- 
lus. Minimize this readiness to re- 
spond and “‘the sensation of pain” is 
no longer “painful”; it can become an 


* isolated “sensation” unaccompanied 


by discomfort. 
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Few statistical methods are better 
known to research workers in the be- 
havioral sciences than the simple, or 
two-way, contingency table with its 
corresponding chi square test for as- 
sociation between the two classifica- 
tions. It has served and will serve a 
unique function in research by facili- 
tating the analysis of data, either 
when we have little knowledge of the 
underlying quantitative properties or 
rather when we deliberately choose 
the method for a cursory analysis. 
Also, there are, perhaps, instances 
when it provides the only method of 
analysis. Kelley (1947, p. 311) has 
listed the conditions under which 
data are placed in categories: 
“(1) when a quantitative relationship 
in classes is not known to exist; 
(2) when the quantitative relation- 
ship is only vaguely surmisable; and 
(3) when the known quantitative re- 
lationship between classes is neg- 
lected because the more primi- 
tive and simple qualitative methods 
would seem to suffice.” 

Despite the widespread use of con- 
tingency analysis during its first four 
decades, the method until recently 
was applicable only to limited kinds 
of research problems and data. Since 
World War II many contingency 
techniques have been developed. It 
is the purpose of the present paper to 
describe some of these techniques 
briefly and to show how they over- 
came problems which have limited 


1A preliminary version of this paper was 
read before the Division on Evaluation and 
Measurement (Div. 5) of APA in New York, 


September 1957, 


the usefulness of the contingency 
method. Problems to be considered 
are concerned with such aspects as 
small samples, indices of relationship, 
specification of hypotheses, higher- 
order interactions, and computation- 
al procedures. 


SMALL SAMPLES 

One problem arises in analysis of 
contingency tables when the data 
constitute a “small sample.” The 
statistical theory presupposes “large 
samples” which are not always con- 
veniently available. The definition 
of ‘small sample” has been arbitrary. 
Karl Pearson suggested that any ex- 
pected cell frequency below 10 is 
small, while Fisher set 5 as the limit. 
The typical solution to this problem 
has been to pool rows or columns so 
as to eliminate small expected fre- 
quencies or to use Yates’ correction 
(Yates, 1934) for a 2X2 table. 

Cochran (1952) believes that no 
rule of thumb is entirely adequate, 
and he indicated that, in a 2X2 
table, the magnitude of all four ex- 
pected frequencies affect the quality 
of the approximation. His paper 
clarifies the choice of tests when one 
has small samples or small cell fre- 
quencies. Since his rules shed new 
light on proper decisions to be made 
in analyzing small sample data and 
since they have not been readily 
available to behavioral researchers, 


they are given here in full. 
Summary Recommendations for the 
Use of X° 
J. Attribute data. The data comes 
to us in groupe form. Pooling © 
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lasses is considered undesirable be- 
Sae of loss of power. k , 
1. The 2X2 table. Use Fisher’s 
exact test (a) if n<20, (b) if 20<n 
<40 and the smallest expectation is 
less than 5. Mainland’s tables... 
are helpful in all such cases. If 
n>40, use X?, corrected for conti- 
nuity if the smallest expectation is 
less than 5. 

2. Tables with degrees of freedom 
between 2 and 60 and all expecta- 
tions less than 5. If 2 is so small that 
Fisher’s exact test can be computed 
without excessive labor, use this. 
Otherwise use X?, considering wheth- 
er this needs correction for continuity 
by finding the next largest value of 
O35 


& 3. Tables with degrees of freedom 
greater than 60 and all expectations 
less than 5. Try to obtain the exact 
mean and variance of X? and use the 
normal approximation to the exact 
distribution, 

4. Tables with more than 1 df and 
Some expectations greater than 5, 


Use X? without correction for 


Ae con- 
tinuity, - 


to the levels recommended by Wil- 
liams (12 per cell for n=200, 20 per 


per cell for 


ı Pool (if nec- 
essary) so that the minimum expecta- 
tion is 1. 

It should be noted that Cochran’s 
n is the total number of cases in the 
contingency table. The symbol Kiis 
what Cochran has suggested for the 
value of chi square obtained by sub- 
stituting empirical data in the formu- 
la. The Greek symbol “yx?” is reserved 
for the tabled values given by the 
theoretical distribution. 

Fisher (1946) has shown how a 
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test of significance using the exact 
probability of an observed table and 
certain other configurations of cell 
frequencies can be applied to a 2x2 
table with small or zero frequencies. 
Freeman and Halton (1951) have ex- 
tended the principle to any number 
of attributes and any number of cate- 
gories per attribute. In general, both 
methods consist of assuming the bor- 
der totals fixed, considering only rela- 
tionships internal to the contingency 
table, considering every possible ar- 
tay of cell frequencies with the given 
border totals, and applying a test of 
significance as follows: (a) all arrays 
subject to the same general condi- 
tions as observed (i.e., the same bor- 
der totals); (b) the corresponding a 
priori probabilities are calculated by 
means of the appropriate probability 
expression; (c) the values of the a 
priori probabilities smaller than or 
equal to the probabilities of all ar- 
Tays which are a priori as probable 
as, or less probable than, the ob- 
served array; (d) all probabilities 
Satisfying the conditions in (c) are 
summed to yield the probability of 
obtaining an array as probable as or 
less probable than the observed ar- 
ray. 

Fisher’s technique has appeared in 
a number of textbooks to date, but 
the technique of Freeman and Hal- 
ton seems not to have caught on. 
The computational labor in either 
technique is tedious, since it involves 
the quotient of the products of sets 
of factorials. However, by using ap- 
propriate tables which are based up- 
on Fisher’s formula and which yield 
the approximate significance level for 

X2 data of various sample sizes 
and marginal totals, one may save 
much of the labor, Such tables rang- 
ing collectively up to sample sizes 
of 50 have been published by Arm- 


Pi 


P. 
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sen (1955), Finney (1948), Latscha 
(1953), and Mainland (1948). 


INDICES OF RELATIONSHIP 


We sometimes wish to express the 
degree of association in a two-way 
contingency table with a significant 
chi square. We would prefer a coeffi- 
cient similar to that for product- 
moment correlation. Some problems 
have arisen in attempts to develop 
such indices. Some that have been 
devised are the coefficient of mean 
square contingency, the phi coeffi- 
cient, tetrachoric correlation, the 
point-biserial coefficient, the coeffi- 
cient of association, and the coeffi- 
cient of colligation. 

Inferences about the degree of as- 
sociation based upon the numerical 
size of a coefficient can be mislead- 
ing. A number of authors (Guilford, 
1936; Johnson & Jackson, 1953; Ken- 
dall, 1947) have called attention to 
the fallacies of such inferences. In 
general, the coefficients often fail to 
satisfy the desiderata of Kendall 
(1947, p. 310), namely, that “(a)it 
shall vanish when the associations 
are independent; (b) it shall be +1 
when there is complete positive as- 
sociation and —1 when there is com- 
plete negative association; (c) it 
should increase as the frequencies 
proceed from dissociation to associ- 
ation.” In using these indices, one 
does not have the same intuitive feel- 
ing for strength of relationship as in 
the case of product-moment corre- 
lation. The latter has a stable, com- 
prehensive frame of reference to aid 
in the interpretation. Also, the 


sampling distribution of the statistic 


is well known. 

Kendall (1947) pointed out that 
there is a distinction between the 
concepts of association and correla- 
tion. For example, it is possible to 


have strong association between two 
variables, when the correlation be- 
tween them is zero. The discrepancy 
comes from the fact that the two 
types of conclusions arise from two 
types of hypotheses. In testing for 
association, we consider all types of 
departures from independence; in 
testing for correlation we consider 
a much more limited kind of alterna- 
tive hypothesis. 

All of the indices previously men- 
tioned, other than product-moment 
correlation, are subject to criticism. 
Let us single out one of them to illus- 
trate some weaknesses of such de- 
vices. The coefficient of mean-square 
contingency, C, is purported to be 
comparable to the Pearson product- 
moment correlation coefficient, 7, 
under some conditions. Guilford 
(1936, p. 357) has pointed out that 
C becomes identical with 7 under the 
following conditions: “(a) The vari- 
ables are of the continuous type; 
(b) N is large; (c) the number of 
classes is sufficient to overcome errors 
of grouping; and (d) the distribu- 
tions are normal.” A number of 
weaknesses can be pointed out in at- 
tempts to meet these assumptions. 
Regarding (a), some data are either 
incapable of expression in terms of 
continuous scales or are difficult to 
quantify. Indeed, some data are in- 
herently qualitative and by defini- 
tion are nonquantifiable. Regarding 
(b), it is not always feasible to collect 
a large sample. Regarding (c), the 
number of classes used in practice 
will rarely be large enough so that C 
approaches 1.00 (the upper limit of 


the product-moment correlation co- 
One formula yields the 


efficient). nul 
value of the upper limit of C for a 
¿Xt table. For example, it has been 


2 table, C cannot 


that in a 2 
shown tha Sia tet Oe 


exceed .707; in a 
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not exceed .866; in a 5X5 table, C 
cannot exceed, .894; and in a 10X10 
table, C cannot exceed .949. Regard- 
ing (d), it is not always convenient 
or even possible to have both the dis- 
tributions normal. d 

Rather than computing the mean- 
square contingency, one could con- 
solidate rows and columns of a larger 
table into a 2X2 table and compute 
any of several indices for a fourfold 
table. However, information is thus 
wasted, and the indices themselves 

leave something to be desired. 

One new solution for the problem 
is the use of scores for rows and col- 
umns and the calculation of regression 
coefficients and correlation coeffi- 
cients from such scores (Cochran, 
1954; Yates, 1948; Williams, 1952). 
Such techniques result in more sensi- 
tive tests for alternative hypotheses, 
The general procedure is to first test 
the hypothesis of independence by 
the over-all chi square test. If the 
hypothesis is rejected, and one con- 
cludes that association does, in fact, 
exist, he may use scoring methods to 
partition out the portion of the as- 
sociation which may be explained by 
correlation or regression, It is easiest 
to explain association as being due to 
the linear correlation of two under- 
lying variates. In Some cases, it may 
be necessary to resort to one or more 
additional pairs of variates as mani- 
fested by new sets of row and column 

scores. 

Two kinds of scores are feasible for 
contingency tables, arbitrary, or a 
priori, scores, and empirical scores, 
Arbitrary scores are chosen along 
some convenient scale according to 
a knowledge of the kind of data it- 
self. The use of such scores serves to 
make some persons uncomfortable, 
However, Cochran (1954, p. 436) de- 
fends the use of arbitrary scores when 
they have embodied “‘the best insight 


available about the way in which the 
classification was constructed and 
used.” Furthermore, he pointed out 
that “any set of scores gives a valid 
test, provided that they are con- 
structed without consulting the re- 
sults of the experiment.” He goes on 
to say that “If the set of scores is 
poor, in that it badly distorts a nu- 
merical scale that really does under- 
lie the ordered classification, the test 
will not be sensitive.” 

Empirical scores are derived from 
the data themselves by statistical 
computation so that the correlation 
between row and column scores is a 
maximum. Such scores are optimal 
in the sense that they yield the high- 
est value possible for any set of 
Scores that could be chosen, Thus, 
they are an improvement over arbi- 
trary scores. The Procedures for cal- 
culating empirical scores grew out of 
work begun by Fisher (1946) who 
considered contingency tables from 
the point of view of discriminant 
analysis, Suppose that we wish to as- 
sign scores to rows and columns. 
What are the best Scores to assign so 
that a linear function of row and col- 
umn scores will best differentiate the 
classes determined by the columns, 
and vice versa? This turns out to be 
a problem jn Maximizing the corre- 
lations between the scores, and the 
required correlations are those known 
as “canonical” in the sense of Hotel- 
ling (1936). Work in this area was 
continued by Maung (1941), The 
methods were applied to a practi- 
cal problem of quantifying letter 
grades of college students by John- 
Son (1950). Bock (1957) showed that 
J empirical scoring scheme of Wil- 
Jams is similar theoretically to the 
techniques of Maung (1941), John- 
son (1950), and Guttman (1941). He 
gives the name “optimum scaling” 
to the general theory and shows rela- 
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tively simple computational proce- 
dures for solving the necessary ma- 
trices. His approach to scaling is par- 
ticularly appropriate if the data are 
to be used in analysis of variance or 
multivariate analysis. 

Several of the above references 
(Bock, 1956; Bock, 1957; Cochran, 
1954; Fisher, 1946; Johnson, 1950; 
Williams, 1952) have described ways 
of testing concordance of scores. In 
testing concordance, one asks two 
questions: First, of the total depar- 
tures from expectation, how much 
can be explained by a set of scores, 
and second, how much is not ex- 
plained by the set of scores? Thus, 
we may discover whether any given 
set of scores is sufficient to explain 
a significant amount of the associa- 
tion between two classifications and 
whether only one set of scores is suf- 
ficient. In answering these ques- 
tions, we have access to partitioning 
of chi square or analysis of variance 
techniques. 

In summary, arbitrary scores can 
be useful in instances in which one is 
familiar with the underlying quanti- 
tative basis of the classifications and 
when one wishes to save computa- 
tional labor at some loss in accuracy; 
on the other hand, empirical scores 
might be used when one is unfamiliar 
with the underlying quantitative 
basis or wishes a more accurate set of 
scores. 

Illustrations of scoring techniques 
as applied to data in the behavioral 
sciences have been given by Mayo 
(1957, 1958). 

Stuart (1953) devised a correlation 
coefficient for two-way tables which 
is a variety of Kendall's tau. His 
formulas, however, do not require 
scores for rows and columns as do 
those of Cochran, Yates, and Wil- 
liams. Rather, his coefficient de- 
pends only on ordinal properties and 
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was offered as a device to measure 
strength of association. He showed 
how the existing theory of the coef- 
ficient may be used to estimate the 
population association, to set confi- 
dence limits for it, and also to test 
the differences in the coefficients cal- 
culated for two contingency tables. 
A general review of the methodol- 
ogy of measures of association for 
contingency tables with two or more 
attributes, together with a clarifica- 
tion and discussion of some of the 
underlying concepts was given by 
Goodman and Kruskal (1954, 1959). 


SPECIFICATION OF HYPOTHESES 
UNDER TEST 


The usual chi square test of inde- 
pendence between two classifications 
can be very useful in the exploratory, 
or pilot, stages of research, when one 
does not or cannot specify the alter- 
native hypotheses. For nonsignifi- 
cant results, one probably would not 
inquire further; however, once signifi- 
cance is demonstrated, it is well to in- 
quire as to what alternative hypothe- 
ses might be plausible and to test 
these empirically. For example, if 
one were interested in explaining as- 
sociation by assuming linear correla- 
tion between two underlying quanti- 
tative variates, the coefficient of 
mean-square contingency would be 
misleading. This coefficient is based 
upon obtained chi square which sub- 
sumes all forms of departure from 
expectation, rather than departures 
due to linear correlation which are 
only one kind of departure. The 
scoring techniques previously men- 


tioned constitute one solution to the 
problem of testing more specific al- 
In addition to 


ternative hypotheses. i 
testing for significant regression CO- 


efficients and for correlation, One 
may also test for homogeneity © 
row or column means, and for differ- 
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etween pairs of row or column 
paces The analogy to the # test and 
analysis of variance is apparent here. 
Distinction among three different 
kinds of sampling processes which 
might have yielded the same 22 
table of data was made by Barnard 
(1947) and Pearson (1947). These 
authors maintained that the abstract 
configuration of a given 2X2 table 
could have meaning for at least three 
different classes of empirical data, 
depending upon the sampling proc- 
ess and the assumptions. Barnard 
used three kinds of “urn experi- 
ments” as models and called the 
three classes the (a) 2x2 Independ- 
ence Trial, (b) 22 Comparative 
Trial, and (c) Double Dichotomy, 
He also maintained that Fisher’s ex- 
act formula applied only to the 22 
Independence Trial and presented 
different formulas for the two classes, 
Pearson designated the first two 
classes of Barnard as Problem I and 
Problem Il, respectively, 
Pointed out, however, that fi 
samples, the results tend to 
each other asymptotically, 
sion of the theory of the po 
tion, computational formul 
function and some illustrative tables 
of the function were presented for 
Problem I by Pearson and Merring- 
ton (1948) and for Problem II by 
Patnaik (1948), 

Cochran (1950) has given the 
theory and application of a test of 
significance for a 2X2 contingency 
table in which there is correlation be- 
tween the observations themselves in 
the cells. Such a situation occurs in 
practice when the same individuals 
are observed under different treat- 

nts. 
moneder (1946) has described and 
illustrated a technique of testing the 
discrepancies of several 2X2 samples 
which represent replications of the 


It was 
or large 
approach 
A discus- 
wer func- 
as for the 


same experiment in regard to the evi- 
dence which they furnish regarding 
the hypothesis under test. His illus- 
tration was for a single attribute with 
expected frequencies given a priori. 


HIGHER-ORDER INTERACTIONS 


Interactions among three or more 
classifications are often of intrin- 
sic interest in contingency analysis. 
However, most of the development 
of concepts for understanding and 
techniques for analyzing higher-order 
interactions has come about rather 
recently, 

Simpson (1951) pointed out the 
possibilities for fallacious conclusions 
that might be drawn from data in 
two-way form where certain effects 
are obscured, while if the data had 
been classified in a three-way form, 
the covert relationships would have 
shown up. The same effect applied to 
test item interactions has been called 
“Meehl’s Paradox’? by Fricke (1956). 
The effect also appeared in a descrip- 
tion of latent Structure analysis by 
Lazarsfeld (1954). The analogue for 
Continuous variables is the well 
known argument against doing sepa- 
rate £ tests rather than a single fac- 
torial experiment, 

t present, there are available 
three approaches toward the assess- 
ment of higher-order interaction, It 
1S not clear just how these three are 
related, or whether they are independ- 
ent approaches to the same prob- 
lem. There appears to be much to be 
done by both mathematical statis- 
ticians and by the applied researcher 
in clarifying the distinctions among 
these three techniques. 

€ simplest case of higher-order 


interaction is that given in the 
2X22 table. A test for this 
Case was fi 


rst described by Bartlett 
© wrote the general formu- 
bic equation in which the 


(1935), wh 
la for a cu 
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independent variable x is the devia- 
tion of each observation in the 
222 table from the corresponding 
expected values. Having solved the 
cubic equation for x, it was easy to 
compute chi square by means of a 
formula which he gave. Bartlett also 
treated more complex tables of more 
than three dichotomized attributes. 
Norton (1945) maintained that the 
calculation of chi square for the com- 
plex table of dichotomized attributes 
is purely a computational difficulty. 
He presented the algebraic model 
and an approximate method of com- 
puting chi square for such a table. 
However, Norton’s method is com- 
putationally tedious. Kastenbaum 
and Lamphiear (1959) demonstrated 
an iterative technique for solving the 
general three-way table which, while 
practical for a desk calculator, is par- 
ticularly well suited for modern high- 
speed computers. It is of interest to 
note that a general computer pro- 
gram covering certain selected cases 
of a three-way table up to size 
55X16 is available at Oak Ridge 
National Laboratory. Illustrations 
of 2X22 problems have been given 
from biometrics by Snedecor (1946) 
and from the behavioral sciences by 
Mayo (1957). 

Another technique for testing high- 
er-order interactions utilizes approxi- 
mation by means of the likelihood 
ratio criterion and has been described 
and illustrated by Mayo (1957). In 
this technique, one can test a number 
of different kinds of higher-order in- 
teractions. For example, in a single 
four-way table, in addition to the six 
simple interactions of pairs of attri- 
butes, one may test 24 different null 
hypotheses about higher-order inter- 
actions, or a total of 30 null hypothe- 
ses for the table. Thus, one may 
test (a) mutual independence among 
all four attributes; (b) mutual inde- 


pendence among any three attri- 
butes; (c) independence between any 
two attributes; (d) independence be- 
tween one attribute and a combina- 
tion of the remaining three; (e) inde- 
pendence between one attribute and 
a combination of the remaining two; 
and (f) independence between a com- 
bination of any two attributes with a 
combination of the other two. 

A third technique for testing high- 
er-order interactions was given by 
Lancaster (1951). His estimate of 
higher-order interaction is based up- 
on the partition of chi square. It is 
applicable whether the parameters 
used are given a priori or are esti- 
mated from the data. It is also gen- 
eral for any number of attributes 
and any number of categories. Al- 
though Lancaster's component for 
interaction is different from Bart- 
lett’s, he shows that they are asymp- 
totically the same. Lancaster’s meth- 
od has the advantage of being com- 
putationally simpler, although it is 
not made clear just what hypotheses 
are being tested. 

Simpson (1951) has clarified the 
interpretation of interaction in con- 
tingency tables to some extent; he 
has compared the specific interpreta- 
tions of Bartlett and Lancaster; he 
has also pointed out some pitfalls to 
be avoided in the interpretation of 
interactions. Illustrations of higher 
order interactions from the behavior- 
al sciences have been given by Mayo 
(1957) and by Sutcliffe and Haber- 
man (1956). 


COMPUTATIONAL PROCEDURES 


The computational labor involved 
in applying some analyses to contin- 
gency data has been prohibitive; ex- 
amples are data involving a large 
number of cells, higher-order inter- 
actions, exact tests, and iterative 
procedures for a series of like data 


468 SAMUEL T. MAYO 


as item analysis or large scale 
oe Aas interpretation. = 
problem of computational labor has 
been attacked in a number of ways. 
For Fisher’s exact test, the tabled 
probabilities for a great many 2X2 
configurations and sample sizes by 
Armsen, Finney, Latscha, and Main- 
land have already been mentioned. 
Also, for the usual chi-square test 
there are a number of formulas which 
do not require the computation of ex- 
pected frequencies as an intermedi- 
ate step. Such a formula for the 2x2 
case is well known and has appeared 
in a number of textbooks; the formu- 
la for the 7 X2 case is also well known; 
it sometimes goes under the name of 
the “Brandt-Snedecor” formula and 
has also appeared in a number of text- 
books. A similar formula for the 
rXs case is less well known. To the 
author’s knowledge, it has not ap- 
peared in a textbook, although it was 
published in two rather specialized 
journals (Carroll & Bennett, 1950; 
Leslie, 1951). It has been known by 
some research workers at universities 
and in military research; however, it 
does not seem to be as generally 
known as the other formulas for chi 
square. A computing routine for the 
7 Xs formula was described by Mayo 
(1959). In one case, use of the formu- 
la reduced the number of machine 
and pencil operations by one half. 
An approximate graphical tech- 
nique for determining significance 
level for the 2X2 table by the simple 
addition and subtraction of cell fre- 
quencies was described by Trites 
(1957). The sample sizes tabled have 
a lower limit of 40, which is approxi- 
mately equal to the upper limit for 
the exact functions previously re- 
ferred to, while the upper limit is 200. 


To use this test, one must draw inde- 
pendent samples; for maximum use- 
fulness, the two samples should be of 
equal size, although the author does 
show how some cases can be handled 
in a more approximate fashion. 
Another approximate, graphical tech- 
nique was devised by Bross and 
Kasten (1957); it does not require 
equal samples and is amenable for 
cases in which one column total is as 
large as 49, 

With the advent of electronic com- 
puters, one should find that some old- 
er, formerly prohibitive techniques 
will become feasible and probably 


newer computing programs will be- 
come available. 


SUMMARY 


The contingency principle for clas- 
sifying, analyzing, and interpreting 
categorical data has been well known 
by research workers for several dec- 
ades. Not until the last decade, 
however, has it realized more of its 
potential usefulness as it has been 
applied to a wider range of data. 
Some problems inherent in the fur- 
therence of its usefulness were dis- 
cussed as well as solutions for these 
problems. 

The analytical techniques treated 

ere and those additional ones sure 
to come in the near future promise: 
(a) improved interpretation of con- 
tingency data of the usual kinds; 
means of quantifying qualitative 
data so as to Provide additional vari- 
ables for research investigations; and 
(e) contributions to both theory and 
practice in configural scoring and pat- 
tern analysis problems, when one is 


interested in higher-order interac- 
tions, 
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